How to use ZenML and DBT together

Hamza Tahir

Jun 21, 2024

•

1 min

Contents

Avoiding technical debt with ML pipelines

ChequeEasy: Banking with Transformers

This is also a heading
This is a heading

Today, Javier from Wayflyer asked about using ZenML And DBT together on Slack. Got me thinking: That seems like quite something that might be useful to many people.

Why use DBT and ZenML together?

ZenML is used for ML workflows, while DBT is used for data transformations. This goes hand in hand when you have use cases like:

Running a data transformation after training a model
Doing post-batch-inference data transformations (That’s Javier’s use case)
Triggering a training/inference/deployment ML workflow after a data transformation is complete

How I’d do it

My suggestion to Javier was to do the transformation as a ZenML success hook:

import requests
from zenml import step

@step(on_success=trigger_dbt)
def run_batch_inference(data: pd.DataFrame): 
  # run batch inference
  return results
  

def trigger_dbt():
  data = {}
  headers = {
    'Authorization': f'token {GITHUB_TOKEN}',
    'Accept': 'application/vnd.github.everest-preview+json'
  }
  url = f'https://api.github.com/repos/{GITHUB_REPO}/dispatches'
  payload = {
      'event_type': 'trigger-action',
      'client_payload': data
  }
  response = requests.post(url, json=payload, headers=headers)
  if response.status_code == 204:
      return jsonify({'message': 'GitHub Action triggered successfully'}), 200
  else:
      return jsonify({'message': 'Failed to trigger GitHub Action'}), response.status_code

The above code simply triggers a GitHub action in a repo where you have the DBT code. As DBT supports function invocation now (as Javier notes) , you could then have a github action that triggers the dbt transformation:

from dbt.cli.main import dbtRunner, dbtRunnerResult

# initialize
dbt = dbtRunner()

# create CLI args as a list of strings
cli_args = ["run", "--select", "tag:my_tag"]

# run the command
res: dbtRunnerResult = dbt.invoke(cli_args)

# inspect the results
for r in res.result:
    print(f"{r.node.name}: {r.status}")

That’s it - You’ve hooked up your ML pipelines with your dbt transformations.

So what?

Establishing a link between the modern data stack and the MLOps world is a challenge. Data and ML people often think differently and want to use their own tools for their daily work. Having more well defined interfaces between the worlds might lead to better outcomes overall.

The above is just an example and is an interesting place to start. Let me know if you do try it - and share your thoughts on the use-case in general over on Slack.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Book a Demo

Use Open Source

Be first to deploy unified MLOps and LLMOps

Join the waitlist for early access to one platform for all your AI workflows.