ZenML
Great Expectations
All integrations

Great Expectations

Ensure Data Quality and Consistency in Your ML Pipelines with Great Expectations and ZenML

Add to ZenML

Ensure Data Quality and Consistency in Your ML Pipelines with Great Expectations and ZenML

Integrate Great Expectations with ZenML to seamlessly incorporate data profiling, testing, and documentation into your ML workflows. This powerful combination allows you to maintain high data quality standards, improve communication, and enhance observability throughout your ML pipeline.

Features with ZenML

  • Seamless integration of Great Expectations data validation within ZenML pipelines
  • Automated storage and versioning of Expectation Suites and Validation Results using ZenML's Artifact Store
  • Easy visualization of Great Expectations artifacts directly in the ZenML dashboard or Jupyter notebooks
  • Flexible deployment options for stores to leverage existing Great Expectations configurations or let ZenML manage the setup

Great Expectations integration screenshot

Main Features

  • Automated data profiling to generate validation rules (Expectations) based on dataset properties
  • Comprehensive data quality checks using predefined or inferred Expectations
  • Human-readable documentation of validation rules, quality checks, and results
  • Support for various data formats and sources, with ZenML currently supporting pandas DataFrames

How to use ZenML with Great Expectations

from zenml.integrations.great_expectations.steps.ge_validator import (
    great_expectations_validator_step,
)

ge_validator_step = great_expectations_validator_step.with_options(
    parameters={
        "expectations_list": [
            GreatExpectationExpectationConfig(
                expectation_name="expect_column_values_to_be_between",
                expectation_args={
                    "column": "X_Minimum",
                    "min_value": 0,
                    "max_value": 2000
                },
            ),
        ],
        "data_asset_name": "steel_plates_train_df",
    }
)

@pipeline(enable_cache=False, settings={"docker": docker_settings})
def validation_pipeline():
    imported_data = importer()
    train, test = splitter(imported_data)
    ge_validator_step(train)

validation_pipeline()

Additional Resources

Connect Your ML Pipelines to a World of Tools

Expand your ML pipelines with more than 50 ZenML Integrations

  • Amazon S3
  • Apache Airflow
  • Argilla
  • AutoGen
  • AWS
  • AWS Strands
  • Azure Blob Storage
  • Azure Container Registry
  • AzureML Pipelines
  • BentoML
  • Comet