ZenML
Label Studio
All integrations

Label Studio

Streamline Data Annotation in ZenML Pipelines with Label Studio

Add to ZenML

Streamline Data Annotation in ZenML Pipelines with Label Studio

Integrate Label Studio, a leading open-source annotation platform, with ZenML to seamlessly incorporate data annotation into your ML workflows. This integration enables efficient labeling of diverse data types, including images, audio, text, and time series, directly within ZenML pipelines.

Features with ZenML

  • Seamless integration of data annotation steps into ZenML pipelines
  • Support for various annotation types (image, audio, text, time series)
  • Automated dataset registration and syncing with Label Studio
  • Easy access to annotated data for downstream pipeline steps
  • Seamless integration with ZenML’s cloud artifact stores (AWS, Azure, GCP)

Label Studio integration screenshot

Main Features

  • Supports a wide range of annotation types and use cases
  • User-friendly web interface for efficient data labeling
  • Customizable label configurations for project-specific requirements
  • Collaborative annotation with multiple users and roles
  • Export annotations in standard formats for further analysis

How to use ZenML with Label Studio


# Setup Label Studio integration
# 1. Create a secret with your Label Studio API key:
#    zenml secret create label_studio_secrets --api_key="<your_label_studio_api_key>"

# 2. Register the Label Studio annotator:
#    zenml annotator register label_studio --flavor label_studio --authentication_secret="label_studio_secrets"

# 3. Update your stack with the Label Studio annotator:
#    zenml stack update -an label_studio

from zenml import pipeline, step
from typing import Dict, Any
from zenml.client import Client

@step
def data_loader() -> Dict[str, Any]:
    """Load labeled data from the active annotator."""
    client = Client()
    annotator = client.active_stack.annotator
    return annotator.get_labeled_data(dataset_name="my_dataset")

@pipeline
def my_pipeline():
    """Define the pipeline using the data loader step."""
    data = data_loader()
    # Process the labeled data here

if __name__ == "__main__":
    my_pipeline()

# Additional CLI commands for working with Label Studio:
# - List all datasets:
#   zenml annotator dataset list
# - Get statistics for a specific dataset:
#   zenml annotator dataset stats <dataset_id>

Additional Resources

Connect Your ML Pipelines to a World of Tools

Expand your ML pipelines with more than 50 ZenML Integrations

  • Amazon S3
  • Apache Airflow
  • Argilla
  • AutoGen
  • AWS
  • AWS Strands
  • Azure Blob Storage
  • Azure Container Registry
  • AzureML Pipelines
  • BentoML
  • Comet