Integrate BentoML with ZenML - Deployer Integrations

Seamlessly Deploy Models to Production with ZenML and BentoML

Simplify your model deployment process by integrating BentoML with ZenML. This powerful combination allows you to package models into production-ready Bentos and serve them locally or in the cloud with ease, streamlining the transition from development to production.

Features with ZenML

Streamlined Model Packaging: Effortlessly package trained models into Bentos using ZenML's built-in BentoML steps
Local Model Serving: Deploy and serve models locally for development and testing with the BentoML Model Deployer
Container-based Model Serving: ZenML’s in-built steps convert your bento into Docker images and automatically push them to your Stack’s Container Registry, from where you can deploy them anywhere.
Cloud Deployment Ready: Bentos are versioned and tracked and you can fetch them from ZenML for seamless deployment to various cloud platforms using bentoctl or yatai .
Standardized Deployment Workflow: Establish a consistent and reproducible model deployment process across your organization

‍

Main Features

Framework-agnostic model packaging and serving
Supports local, cloud, and Kubernetes deployments
Easy-to-use Python API for defining prediction services
Automatic generation of OpenAPI specifications
Built-in monitoring and logging capabilities

‍

How to use ZenML with

BentoML

zenml model-deployer register bentoml_deployer --flavor=bentoml
zenml stack update -d bentoml_deployer

You first need to define a BentoML Service in a service.py file and define the logic to serve your model there. It could look like the following:

@bentoml.service(
    name=SERVICE_NAME,
)
class MNISTService:
    def __init__(self):
        # load model
        self.model = bentoml.pytorch.load_model(MODEL_NAME)
        self.model.eval()

    @bentoml.api()
    async def predict_ndarray(
        self, 
        inp: Annotated[np.ndarray, DType("float32"), Shape((28, 28))]
    ) -> np.ndarray:
        inp = np.expand_dims(inp, (0, 1))
        output_tensor = await self.model(torch.tensor(inp))
        return to_numpy(output_tensor)

You can then define your pipeline as follows:

from zenml import pipeline, step
from zenml.integrations.bentoml.steps import bento_builder_step
from zenml.integrations.bentoml.steps import bentoml_model_deployer_step


@pipeline
def bento_builder_pipeline():
    model = model_training_step()
    bento = bento_builder_step(
        model=model,
        model_name="pytorch_mnist",  # Name of the model
        model_type="pytorch",  # Type of the model (pytorch, tensorflow, sklearn, xgboost..)
        service="service.py:CLASS_NAME",  # Path to the service file within zenml repo
    )
    deployed_model = bentoml_model_deployer_step(
        bento=bento,
        model_name="pytorch_mnist",  # Name of the model
        port=3001,  # Port to be used by the http server
        deployment_type="container" # the type of deployment, either local or container
    )

This code example demonstrates how to use ZenML's BentoML integration steps within a pipeline. First, the bento_builder_step packages the trained model into a Bento bundle. Then, the bentoml_model_deployer_step deploys the Bento locally or as a container, making it available for serving predictions via an HTTP endpoint.

Additional Resources

BentoML ZenML Documentation

Official BentoML Documentation

BentoML ZenML SDK Docs

BentoML

Seamlessly Deploy Models to Production with ZenML and BentoML

Features with ZenML

Main Features

Seamlessly Deploy Models to Production with ZenML and BentoML

Unify Your ML and LLM Workflows

Connect Your ML Pipelines to a World of Tools

Connect Your ML Pipelines to a World of Tools