Last updated: February 1, 2023
Introduction
This project showcases how to use Scikit-learn to construct a model that can predict the number of 3-pointer shots in the next NBA game. In addition, we show how to use Evidently as a data validation tool to detect drift in training and testing data, MLflow for tracking experiments, and Kubeflow Pipelines to schedule and repeat pipeline runs. We also include an alerter as a component to send notifications on Discord.
We will construct 3 pipelines for this project:
- Data Validation - Check for train-test data drift using Evidently.
- Training - Train models with Scikit-learn and track experiments using MLflow.
- Inference - Run inference on new data and posts notification on Discord.
Use case
This project structure including the stack and components can be used on occasions you need to construct ML pipelines for tabular data ML problems.
Stack and Components

This project uses the following Stack Components:
- Orchestrator - Kubeflow Pipelines.
- Artifact Store - Amazon S3.
- Container Registry - Amazon Elastic Container Registry.
- Secret Manager - AWS Secret Manager.
- Experiment Tracker - MLflow.
- Data Validator - Evidently.
- Alerter - Discord.
Code
The codes to reproduce this project are open-source ZenML Project repository on GitHub. View the code here.
Runs
All pipelines were run remotely on with a Kubeflow orchestrator. The DAGs are shown on the Kubeflow central dashboard and the ZenML Dashboard.
The following DAG shows the drift detection pipeline on the ZenML Dashboard:

The following DAG shows the training pipeline on the ZenML Dashboard:

The following DAG shows the inference pipeline on the ZenML Dashboard:

Additional resources
This blog post writes in detail about the motivation and implementation of this project.
We showcased this project in a live discussion with Ben Epstein from MLOps Community. Watch it here.