
How I Rebuilt zenml.io in a Week with Claude Code
I rebuilt zenml.io — 2,224 pages, 20 CMS collections — from Webflow to Astro in a week using Claude Code and a multi-model AI workflow. Here's how.
See how ZenML compares to Dataiku for building production ML pipelines. While Dataiku offers a comprehensive visual AI platform with drag-and-drop Flows, built-in AutoML, and enterprise governance for diverse teams, ZenML provides a lightweight, open-source alternative that gives ML engineers full control over their stack. Compare ZenML’s portable, Python-native pipelines against Dataiku’s all-in-one platform approach. Discover how ZenML can help you build reproducible, production-grade ML workflows with a portable, code-first approach — while maintaining the freedom to integrate with any tool in your ecosystem.
“After a benchmark on several solutions, we choose ZenML for its stack flexibility and its incremental process. We started from small local pipelines and gradually created more complex production ones. It was very easy to adopt.”
Clément Depraz
Data Scientist at Brevo
Feature-by-feature comparison
| Workflow Orchestration | Portable, code-defined pipelines that run on any orchestrator (Airflow, Kubeflow, local, etc.) via composable stacks | Built-in visual Flow orchestrator with Scenarios for scheduling, event triggers, and conditional automation |
| Integration Flexibility | Designed to integrate with any ML tool — swap orchestrators, trackers, artifact stores, and deployers without changing pipeline code | Rich built-in connectors (40+ data sources) and plugins, but integrations work within Dataiku's platform abstraction layer |
| Vendor Lock-In | Open-source and vendor-neutral — pipelines are pure Python code portable across any infrastructure | Proprietary platform where visual Flows, Recipes, and Scenarios are tied to Dataiku DSS — migrating away requires reimplementation |
| Setup Complexity | Pip-installable, start locally with minimal infrastructure — scale by connecting to cloud compute when ready | Enterprise setup requires Design, Automation, and API nodes with server provisioning. Cloud trial available but production is heavyweight |
| Learning Curve | Familiar Python pipeline definitions with simple decorators — fewer platform concepts to learn for ML engineers | Visual interface accessible to non-coders (analysts, business users). Extensive Academy training. But mastering the full platform takes time |
| Scalability | Scales via underlying orchestrator and infrastructure — leverage Kubernetes, cloud services, or distributed compute | Enterprise-grade scaling with in-database SQL push-down, Spark integration, Kubernetes execution, and multi-node architecture |
| Cost Model | Open-source core is free — pay only for infrastructure. Optional managed service with transparent usage-based pricing | Enterprise subscription pricing (sales-led, custom quotes). Free Edition available for up to 3 users with limited production features |
| Collaborative Development | Collaboration through code sharing, Git workflows, and the ZenML dashboard for pipeline visibility and model management | Strong multi-persona collaboration with project wikis, discussions, shared dashboards, and role-based access across data scientists and analysts |
| ML Framework Support | Framework-agnostic — use any Python ML library in pipeline steps with automatic artifact serialization | Built-in AutoML covers scikit-learn, XGBoost, and TensorFlow/Keras. Code recipes support any framework installable in code environments |
| Model Monitoring & Drift Detection | Integrates with monitoring tools like Evidently and Great Expectations as pipeline steps for customizable drift detection | Built-in Model Evaluation Store, Unified Monitoring dashboard, and drift analysis for data, prediction, and performance drift |
| Governance & Access Control | Pipeline-level lineage, artifact tracking, RBAC, and model control plane for audit trails and approval workflows | Enterprise-grade governance with Dataiku Govern module, audit logs, data catalog and lineage, LDAP/SSO, and regulatory compliance features |
| Experiment Tracking | Integrates with any experiment tracker (MLflow, W&B, etc.) as part of your composable stack | Built-in experiment tracking for AutoML with model comparison UI. Supports logging from scikit-learn, XGBoost, LightGBM, and TensorFlow |
| Reproducibility | Auto-versioned code, data, and artifacts for every pipeline run — portable reproducibility across any infrastructure | Managed code environments, project bundles for deployment, and Flow determinism. Requires discipline around data versioning |
| Auto Retraining Triggers | Supports scheduled pipelines and event-driven triggers that can initiate retraining based on drift detection or data changes | Native Scenarios with time-based schedules, event triggers, and conditional logic for automated retraining and deployment |
Code comparison
from zenml import pipeline, step, Model
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
@step
def ingest_data() -> pd.DataFrame:
return pd.read_csv("data/dataset.csv")
@step
def train_model(df: pd.DataFrame) -> RandomForestClassifier:
X, y = df.drop("target", axis=1), df["target"]
model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)
return model
@step
def evaluate(model: RandomForestClassifier, df: pd.DataFrame) -> float:
X, y = df.drop("target", axis=1), df["target"]
return float(accuracy_score(y, model.predict(X)))
@step
def check_drift(df: pd.DataFrame) -> bool:
# Plug in Evidently, Great Expectations, etc.
return detect_drift(df)
@pipeline(model=Model(name="my_model"))
def ml_pipeline():
df = ingest_data()
model = train_model(df)
accuracy = evaluate(model, df)
drift = check_drift(df)
# Runs on any orchestrator (local, Airflow, Kubeflow),
# auto-versions all artifacts, and stays fully portable
# across clouds — no platform lock-in
ml_pipeline() # Dataiku DSS platform workflow
# Runs inside Dataiku's managed environment
import dataiku
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Read input dataset from Dataiku's managed storage
dataset = dataiku.Dataset("customers_prepared")
df = dataset.get_dataframe()
X = df.drop("target", axis=1)
y = df["target"]
# Train model inside Dataiku's code recipe
model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)
acc = accuracy_score(y, model.predict(X))
print(f"Accuracy: {acc}")
# Write predictions to output Dataiku dataset
preds = pd.DataFrame({"prediction": model.predict(X)})
output = dataiku.Dataset("predictions")
output.write_with_schema(preds)
# Multi-step orchestration uses visual Flows + Scenarios
# (configured through Dataiku's platform UI).
# AutoML, monitoring, and retraining are all managed
# within the proprietary DSS environment.
# Requires Dataiku server and enterprise license.
ZenML is fully open-source and vendor-neutral, letting you avoid the significant licensing costs and platform lock-in of proprietary enterprise platforms. Your pipelines remain portable across any infrastructure, from local development to multi-cloud production.
ZenML offers a pip-installable, Python-first approach that lets you start locally and scale later. No enterprise deployment, platform operators, or Kubernetes clusters required to begin — build production-grade ML pipelines in minutes, not weeks.
ZenML's composable stack lets you choose your own orchestrator, experiment tracker, artifact store, and deployer. Swap components freely without re-platforming — your pipelines adapt to your toolchain, not the other way around.
Expand Your Knowledge

I rebuilt zenml.io — 2,224 pages, 20 CMS collections — from Webflow to Astro in a week using Claude Code and a multi-model AI workflow. Here's how.


Agentic RAG without guardrails spirals out of control. Here's how ZenML's dynamic pipelines give you fan-out, budget limits, and lineage without limiting the LLMs.