Integrations
Evidently
and
ZenML
Ensure data quality and guard against drift with Evidently profiling in ZenML
GitHub
Evidently
All integrations

Evidently

Ensure data quality and guard against drift with Evidently profiling in ZenML
Add to ZenML

Ensure data quality and guard against drift with Evidently profiling in ZenML

The Evidently integration enables you to seamlessly incorporate data quality checks, data drift detection, and model performance analysis into your ZenML pipelines. Leverage Evidently's powerful profiling and validation features to maintain the integrity and reliability of your ML workflows.

Features with ZenML

  • Seamless integration of Evidently data profiling and validation steps into ZenML pipelines
  • Automated data quality checks and data drift detection for improved ML reliability
  • Comprehensive model performance analysis and comparison within ZenML workflows
  • Easy configuration and customization of Evidently metrics and tests using ZenML utilities
  • Direct visualization of Evidently reports and test results in the ZenML dashboard

Main Features

  • Comprehensive data quality analysis and reporting
  • Automated data drift and model drift detection
  • Flexible configuration of custom metrics and validation tests
  • Support for both classification and regression tasks
  • Detailed insights into feature behavior and distributions

How to use ZenML with
Evidently

import pandas as pd

from zenml import pipeline, step

from typing_extensions import Tuple, Annotated

from zenml.artifacts.artifact_config import ArtifactConfig

import pandas as pd
from sklearn import datasets

@step
def data_loader() -> pd.DataFrame:
    """Load the OpenML women's e-commerce clothing reviews dataset."""
    reviews_data = datasets.fetch_openml(
        name="Womens-E-Commerce-Clothing-Reviews", version=2, as_frame="auto"
    )
    reviews = reviews_data.frame
    return reviews
    
@step
def data_splitter(
    reviews: pd.DataFrame,
) -> Tuple[Annotated[pd.DataFrame, ArtifactConfig(name="reference_dataset")], Annotated[pd.DataFrame, ArtifactConfig(name="comparison_dataset")]]:
    """Splits the dataset into two subsets, the reference dataset and the
    comparison dataset.
    """
    ref_df = reviews[reviews.Rating > 3].sample(
        n=5000, replace=True, ignore_index=True, random_state=42
    )
    comp_df = reviews[reviews.Rating < 3].sample(
        n=5000, replace=True, ignore_index=True, random_state=42
    )
    return ref_df, comp_df


from zenml.integrations.evidently.metrics import EvidentlyMetricConfig
from zenml.integrations.evidently.steps import (
    EvidentlyColumnMapping,
    evidently_report_step,
)

text_data_report = evidently_report_step.with_options(
    parameters=dict(
        column_mapping=EvidentlyColumnMapping(
            target="Rating",
            numerical_features=["Age", "Positive_Feedback_Count"],
            categorical_features=[
                "Division_Name",
                "Department_Name",
                "Class_Name",
            ],
            text_features=["Review_Text", "Title"],
        ),
        metrics=[
            EvidentlyMetricConfig.metric("DataQualityPreset"),
            EvidentlyMetricConfig.metric(
                "TextOverviewPreset", column_name="Review_Text"
            ),
            EvidentlyMetricConfig.metric_generator(
                "ColumnRegExpMetric",
                columns=["Review_Text", "Title"],
                reg_exp=r"[A-Z][A-Za-z0-9 ]*",
            ),
        ],
        # We need to download the NLTK data for the TextOverviewPreset
        download_nltk_data=True,
    ),
)


import json

@step
def text_analyzer(
    report: str,
) -> Tuple[Annotated[int, ArtifactConfig(name="missing_values_current")], Annotated[int, ArtifactConfig(name="missing_values_reference")]]:
    """Analyze the Evidently text Report and return the number of missing
    values in the reference and comparison datasets.
    """
    result = json.loads(report)["metrics"][0]["result"]
    return (
        result["current"]["number_of_missing_values"],
        result["reference"]["number_of_missing_values"],
    )


@pipeline(enable_cache=False)
def text_data_report_test_pipeline():
    """Links all the steps together in a pipeline."""
    data = data_loader()
    reference_dataset, comparison_dataset = data_splitter(data)
    report, _ = text_data_report(
        reference_dataset=reference_dataset,
        comparison_dataset=comparison_dataset,
    )
    text_analyzer(report)


if __name__ == "__main__":
    # Run the pipeline
    text_data_report_test_pipeline()

In the code above, Evidently is used within a ZenML pipeline to monitor and validate data quality and text data characteristics. The EvidentlyColumnMapping is configured to specify targets, numerical features, categorical features, and text features, helping Evidently understand the data structure. The evidently_report_step generates a report with various metrics, including DataQualityPreset for data quality overview, TextOverviewPreset for text data overview, and ColumnRegExpMetric to check for specific patterns in text columns. The text_analyzer step then analyzes this report to extract the number of missing values in the reference and comparison datasets. The pipeline links these steps, loading the data, splitting it into reference and comparison datasets, generating the Evidently report, and analyzing it for missing values. This setup integrates Evidently’s capabilities into a ZenML pipeline for comprehensive data validation.

Additional Resources
Evidently Integration Documentation

Ensure data quality and guard against drift with Evidently profiling in ZenML

The Evidently integration enables you to seamlessly incorporate data quality checks, data drift detection, and model performance analysis into your ZenML pipelines. Leverage Evidently's powerful profiling and validation features to maintain the integrity and reliability of your ML workflows.
Evidently

Start Your Free Trial Now

No new paradigms - Bring your own tools and infrastructure
No data leaves your servers, we only track metadata
Free trial included - no strings attached, cancel anytime
Alt text: "Dashboard displaying a list of machine learning models with details on versioning, authors, and tags for insights and predictions."

Connect Your ML Pipelines to a World of Tools

Expand your ML pipelines with Apache Airflow and other 50+ ZenML Integrations
Kaniko
Google Cloud Vertex AI Pipelines
Hugging Face (Inference Endpoints)
TensorFlow
Neptune
Elastic Container Registry
Tekton
XGBoost
Weights & Biases
Docker
Comet