Last updated: February 1, 2023
This project showcases how to use Scikit-learn to construct a model that can predict the number of 3-pointer shots in the next NBA game. In addition, we show how to use Evidently as a data validation tool to detect drift in training and testing data, MLflow for tracking experiments, and Kubeflow Pipelines to schedule and repeat pipeline runs. We also include an alerter as a component to send notifications on Discord.
We will construct 3 pipelines for this project:
- Data Validation - Check for train-test data drift using Evidently.
- Training - Train models with Scikit-learn and track experiments using MLflow.
- Inference - Run inference on new data and posts notification on Discord.
This project structure including the stack and components can be used on occasions you need to construct ML pipelines for tabular data ML problems.
Stack and Components
This project uses the following Stack Components:
- Orchestrator - Kubeflow Pipelines.
- Artifact Store - Amazon S3.
- Container Registry - Amazon Elastic Container Registry.
- Secret Manager - AWS Secret Manager.
- Experiment Tracker - MLflow.
- Data Validator - Evidently.
- Alerter - Discord.
All pipelines were run remotely on with a Kubeflow orchestrator. The DAGs are shown on the Kubeflow central dashboard and the ZenML Dashboard.
The following DAG shows the drift detection pipeline on the ZenML Dashboard:
The following DAG shows the training pipeline on the ZenML Dashboard:
The following DAG shows the inference pipeline on the ZenML Dashboard: