All Projects

LLM Complete Guide

Start with a simple RAG pipeline then graduate to a more complex setup that involves finetuning embeddings, reranking retrieved documents, and even finetuning the LLM itself.
Project
LLM Complete Guide
project id
llm-complete-guide
Use this id to create a new project in ZenML
Pipelines

Basic RAG Pipeline

Pipeline that scrapes ZenML documentation, generates embeddings, and stores them in a vector store (Pinecone, PostgreSQL, or Elasticsearch). Addresses document ingestion and indexing.

Query Pipeline

Pipeline that queries the vector store using LLMs (GPT-4, GPT-3.5, Claude 3, or Claude Haiku) to provide answers about ZenML. Includes LLM call tracing with Langfuse.

Deployment Pipeline

Pipeline that deploys the RAG application to Hugging Face Spaces as a Gradio app, enabling user interaction with the question-answering system.

LLM RAG Evaluation Pipeline

Pipeline that evaluates the RAG system's performance, assessing retrieval quality and end-to-end response accuracy.

Langfuse Evaluation Pipeline

Pipeline that analyzes user-marked responses (good/bad) from the deployed Hugging Face space to evaluate system performance based on real user feedback.

Distilabel Synthetic Data Generation Pipeline

Pipeline that generates synthetic training data using distilabel for embeddings finetuning. Integrates with Argilla for data annotation.

Embeddings Finetuning Pipeline

Pipeline that finetunes embedding models using synthetic data and Matryoshka loss function to improve retrieval performance. Uses annotated data from Argilla.

Recommended Stack

Stack Components

  • Orchestrator: default (local), with options for AWS, GCP, Azure cloud orchestrators
  • Artifact Store: local or cloud-based (AWS S3, GCS, Azure Blob)
  • Vector Store: Pinecone (default), PostgreSQL with pgvector, Elasticsearch, or Supabase
  • LLM Provider: OpenAI (GPT-4, GPT-3.5), Anthropic (Claude 3, Claude Haiku) via litellm
  • Annotator: Argilla (for synthetic data annotation)
  • Monitoring: Langfuse (for LLM call tracing and evaluation)
  • Deployment: Hugging Face Spaces (for Gradio app deployment)
  • Dashboard: ZenML Pro (managed) or self-hosted ZenML dashboard

Details

LLM-Complete is a production-ready RAG (Retrieval-Augmented Generation) system that demonstrates how to build scalable chat applications with ZenML. Built with progressive MLOps pipelines, it evolves from basic document retrieval to advanced capabilities including embeddings finetuning, document reranking, and LLM optimization for domain-specific question answering about ZenML.

What It Does

LLM-Complete transforms ZenML documentation into an intelligent Q&A system while showcasing best practices for LLMOps workflows. It processes documentation through automated pipelines that scrape content, generate embeddings, store them in vector databases, and serve responses via multiple LLMs. The system includes comprehensive evaluation frameworks, synthetic data generation for finetuning, and deployment to Hugging Face Spaces for user interaction.

Gallery

Unify Your ML and LLM Workflows

Free, powerful MLOps open source foundation
Works with any infrastructure
Upgrade to managed Pro features
Dashboard displaying machine learning models, including versions, authors, and tags. Relevant to model monitoring and ML pipelines.