Integrate Hugging Face (Inference Endpoints) with ZenML

Effortlessly deploy Hugging Face models to production with ZenML

Integrate Hugging Face Inference Endpoints with ZenML to streamline the deployment of transformers, sentence-transformers, and diffusers models. This integration allows you to leverage Hugging Face's secure, scalable infrastructure for hosting models, while managing the deployment process within your ZenML pipelines.

Features with ZenML

Seamless deployment of Hugging Face models directly from ZenML pipelines
Simplified management of inference endpoints within the ZenML ecosystem
Automatically scale deployments based on demand using Hugging Face's infrastructure
Maintain a centralized registry of deployed models for easy tracking and monitoring

‍

Hugging Face (Inference Endpoints) integration screenshot

Main Features

Secure model hosting on dedicated Hugging Face infrastructure
Autoscaling capabilities to handle variable inference workloads
Support for a wide range of model types and frameworks
Pay-per-use pricing for cost-effective deployments
Enterprise-grade security features like VPC deployment

‍

How to use ZenML with Hugging Face (Inference Endpoints)


from zenml.integrations.huggingface.steps import huggingface_model_deployer_step
from zenml.integrations.huggingface.services.huggingface_deployment import HuggingFaceDeploymentService
from zenml.integrations.huggingface.services import HuggingFaceServiceConfig

@step
def predictor(
    service: HuggingFaceDeploymentService,
) -> Annotated[str, "predictions"]:
    # Run a inference request against a prediction service
    data = load_live_data()
    prediction = service.predict(data)
    return prediction
    
@pipeline
def deploy_and_infer():
    service_config = HuggingFaceServiceConfig(model_name=model_name)
    service = huggingface_model_deployer_step(
        model_name="text-classification-model",
        accelerator="gpu",
        hf_repository="myorg/text-classifier",
        task="text-classification"
    )
    predictor(service)

‍

Additional Resources

Hugging Face Inference Endpoints documentation