Integrate Label Studio with ZenML - Data Annotator Integrations

Streamline Data Annotation in ZenML Pipelines with Label Studio

Integrate Label Studio, a leading open-source annotation platform, with ZenML to seamlessly incorporate data annotation into your ML workflows. This integration enables efficient labeling of diverse data types, including images, audio, text, and time series, directly within ZenML pipelines.

Features with ZenML

Seamless integration of data annotation steps into ZenML pipelines
Support for various annotation types (image, audio, text, time series)
Automated dataset registration and syncing with Label Studio
Easy access to annotated data for downstream pipeline steps
Seamless integration with ZenML’s cloud artifact stores (AWS, Azure, GCP)

‍

Main Features

Supports a wide range of annotation types and use cases
User-friendly web interface for efficient data labeling
Customizable label configurations for project-specific requirements
Collaborative annotation with multiple users and roles
Export annotations in standard formats for further analysis

‍

How to use ZenML with

Label Studio


# Setup Label Studio integration
# 1. Create a secret with your Label Studio API key:
#    zenml secret create label_studio_secrets --api_key="<your_label_studio_api_key>"

# 2. Register the Label Studio annotator:
#    zenml annotator register label_studio --flavor label_studio --authentication_secret="label_studio_secrets"

# 3. Update your stack with the Label Studio annotator:
#    zenml stack update -an label_studio

from zenml import pipeline, step
from typing import Dict, Any
from zenml.client import Client

@step
def data_loader() -> Dict[str, Any]:
    """Load labeled data from the active annotator."""
    client = Client()
    annotator = client.active_stack.annotator
    return annotator.get_labeled_data(dataset_name="my_dataset")

@pipeline
def my_pipeline():
    """Define the pipeline using the data loader step."""
    data = data_loader()
    # Process the labeled data here

if __name__ == "__main__":
    my_pipeline()

# Additional CLI commands for working with Label Studio:
# - List all datasets:
#   zenml annotator dataset list
# - Get statistics for a specific dataset:
#   zenml annotator dataset stats <dataset_id>

This code snippet demonstrates how to set up and use the Label Studio annotator integration with ZenML. It includes instructions for creating a secret with the Label Studio API key, registering the Label Studio annotator, and updating the ZenML stack with the annotator. The code defines a pipeline with a data_loader step that retrieves labeled data from the active annotator using the specified dataset name. The pipeline can then process the labeled data further. Additional CLI commands for working with Label Studio datasets are also provided.

Data Annotation and Labeling MLOps with ZenML and Label Studio

Additional Resources

End-to-End Computer Vision Example with Label Studio

Label Studio Integration Documentation

Label Studio Official Documentation

Streamline Data Annotation in ZenML Pipelines with Label Studio

Label Studio

Streamline Data Annotation in ZenML Pipelines with Label Studio

Features with ZenML

Main Features

Streamline Data Annotation in ZenML Pipelines with Label Studio

Unify Your ML and LLM Workflows

Connect Your ML Pipelines to a World of Tools

Connect Your ML Pipelines to a World of Tools