Last updated: February 1, 2023
This repository showcases how ZenML can be used for machine learning with a GitHub workflow that automates CI/CD with continuous model training and continuous model deployment to production. This allows data scientists to experiment with data processing and model training locally and then have code changes automatically tested and validated through the standard GitHub PR peer review process. Changes that pass the CI and code review are then deployed automatically to production.
This repository is also meant to be used as a template: you can fork it and easily adapt it to your own MLOps stack, infrastructure, code, and data.
Here's an architectural diagram of what this can look like:
The pipeline implementations follow a set of best practices for MLOps summarized below:
- Experiment Tracking: All experiments are logged with an experiment tracker (MLflow), which allows for easy comparison of different runs and models and provides quick access to visualization and validation reports.
- Data and Model validation: The pipelines include a set of Deepchecks-powered steps that verify the integrity of the data and evaluate the model after training. The results are gathered, analyzed, and then a report is generated with a summary of the findings and a suggested course of action. This provides useful insights into the quality of the data and the performance of the model and helps to catch potential issues early on before the model is deployed to production.
- Pipeline Tracking: All pipeline runs and their artifacts are of course versioned and logged with ZenML. This enables features such as lineage tracking, provenance, caching, and reproducibility.
- Continuous Integration: All changes to the code are tested and validated automatically using GitHub Actions. Only changes that pass all tests are merged into the main branch. This applies not only to the code itself but also to the ML artifacts, such as the data and the model.
- Continuous Deployment: When a change is merged into the main branch, it is automatically deployed to production using ZenML and GitHub Actions. There are also additional checks in place to ensure that the model is not deployed if it is not fit for production or performs worse than the model currently deployed.
- Software Dependency Management: All software dependencies are managed in a way that guarantees full reproducibility and are automatically installed by ZenML in the pipeline runtime environments. Python package versions are frozen and pinned to ensure that the pipeline runs are fully reproducible.
- Reproducible Randomness: All randomness is controlled and seeded to ensure reproducibility and caching of the pipeline runs.
Stack and Components
Locally, this project uses the following Stack Components:
- Orchestrator - Local Orchestrator.
- Artifact Store - Local Artifact Store
- Experiment Tracker - MLflow
- Data Validator - Deepchecks.
- Model Deployer - MLflow
For the production stack, it uses:
- Artifact Store - S3
- Orchestrator - Kubeflow
- Container Registry - AWS ECR
- Secrets Manager - AWS Secrets Manager
- Model Deployer - kserve
- Data Validator - Deepchecks
- Experiment Tracker - MLflow
- Image Builder - Local
The visualization for DAGs generated as part of this project can be viewed inside a demo version of the ZenML Dashboard.
The following DAG shows the pipeline on the ZenML Dashboard:
You can access and interact with the DAGs here on a shared ZenML Dashboard.