## Overview
PeterCat.ai represents a practical case study in building production LLM-powered assistants specifically designed for GitHub repositories. The project emerged from a common developer pain point: the inefficiency of finding answers within open-source projects, where searching through issues and waiting for maintainer responses can take hours or even days. The creator recognized that while general-purpose LLMs like ChatGPT could help with coding challenges, they often provided inaccurate information about niche open-source projects due to lacking specific repository context.
The solution evolved from a simple prototype into a full-fledged "GitHub Assistant factory" that enables any repository owner to deploy a customized AI assistant. The project was developed under AFX, a department at Ant Group responsible for successful open-source projects like Ant Design, UMI, and Mako. PeterCat was officially open-sourced in September 2024 at the Shanghai INCLUSION Conference, achieving over 850 stars and adoption by 178 open-source projects within three months.
## Agent Architecture and Design
The core architecture follows a relatively straightforward but effective pattern: using a large language model as the reasoning core, combined with specialized tools that interface with GitHub APIs. The system was built using LangChain's toolkit for agent construction, leveraging its AgentExecutor and related methods to combine prompts, model services, and GitHub tools into a functional agent.
The system prompt design proved crucial in enabling effective tool selection and response generation. The creator discovered that skill definitions in prompts needed to strike a careful balance—too specific would constrain the model's capabilities, while too broad wouldn't effectively guide tool usage. The prompts define three main skills for the agent: engaging interaction for conversational responses, insightful information search using multiple tools (including a custom knowledge search, Tavily web search, and repository information retrieval), and expert issue solving for handling specific user problems.
The agent uses tools implemented as Python functions that wrap GitHub API calls. For example, a `search_repo` tool retrieves basic repository information like star counts, fork counts, and commit counts—data that language models typically cannot provide accurately on their own. These tools are registered with the agent so it can invoke them based on user queries.
## Specialized Agent Separation
Rather than using a single monolithic agent, the production system implements a divide-and-conquer approach with specialized agents for different tasks. This architectural decision stemmed from the recognition that code review and issue response require different skill sets and focus areas. By separating these concerns, each agent can be optimized for its specific domain, leading to higher quality outputs.
### PR Review Agent
The PR review agent is configured with the identity of a professional code reviewer, tasked with evaluating code across four key dimensions: functionality, logical errors, security vulnerabilities, and major performance issues. It uses two specialized tools: `create_pr_summary` for posting general comments in the PR discussion area, and `create_review_comment` for adding line-specific comments in code commits.
The prompts for code review follow a chain-of-thought inspired approach, breaking down the review process into structured tasks. For PR summaries, the agent follows a specific markdown format including a high-level walkthrough and a table of file changes. For line-by-line reviews, significant preprocessing is required to transform code diffs into a machine-readable format with line number annotations and clear distinction between additions and deletions.
An important operational consideration is the implementation of skip mechanisms. Users can include keywords like "skip", "ignore", "wip", or "[skip ci]" in PR titles or descriptions to bypass automated reviews. Draft PRs are also automatically skipped. The prompts include explicit constraints to avoid commenting on minor style inconsistencies, formatting issues, or changes that don't impact functionality—addressing the common problem of overly verbose AI-generated feedback.
### Issue Handling Agent
The issue handling agent focuses on responding to newly created issues and participating in ongoing discussions. It emphasizes factual accuracy and is designed to provide serious, well-researched responses without making assumptions. The prompts explicitly instruct the agent to avoid definitive statements like "this is a known bug" unless there is absolute certainty, as such irresponsible assumptions can be misleading to users.
The agent includes safeguards against circular references—if a found issue number matches the current issue being responded to, it recognizes that no similar issues were found and avoids redundant mentions.
## RAG Implementation for Repository Knowledge
A critical evolution of the system was the integration of RAG (Retrieval-Augmented Generation) to provide assistants with deeper repository understanding. The creator evaluated two main approaches for enhancing model knowledge: fine-tuning and RAG. Fine-tuning was rejected due to its computational resource requirements and unsuitability for frequently changing codebases. RAG was chosen because it requires fewer resources and adapts dynamically to repository updates.
### Repository Vectorization Pipeline
The vectorization process is designed with scalability in mind. Given the large number of files and diverse file types in code repositories, the system takes a file-level granular approach, recursively traversing repository files and creating vectorization tasks for each. AWS Lambda Functions are used to break down vectorization into asynchronous tasks, preventing the process from blocking instance creation.
Files are downloaded using GitHub's open APIs, with content retrieved from specific repository paths. Before vectorization, SHA-based duplicate checking is performed against the vector database—if a file's SHA already exists, the vectorization process is skipped. To minimize noise, non-Markdown files are excluded from processing, as most valuable documentation exists in README.md and similar files.
Historical issues are also incorporated into the knowledge base, but with quality filtering. Only closed issues with high engagement levels are included, recognizing that low-quality content can degrade RAG retrieval effectiveness.
### Text Chunking and Embedding
Long texts are split into smaller chunks using LangChain's CharacterTextSplitter with defined CHUNK_SIZE and CHUNK_OVERLAP parameters. The overlap between chunks ensures that important contextual information is shared across different segments, minimizing boundary effects and enabling the RAG algorithm to capture transition information more accurately.
The split text chunks are vectorized using OpenAI's embedding model, with resulting vectors stored in a Supabase database. Supabase provides vector storage capabilities with a similarity search function based on cosine distance.
### Content Retrieval
When users interact with GitHub Assistant, their input is vectorized and matched against the vector database. The system uses a similarity search function in Supabase that calculates the cosine similarity between the query embedding and stored document embeddings, returning results ordered by similarity.
The retrieved content serves as context for the language model, which then comprehends and refines this information to produce responses that better match user needs. This focused approach enables GitHub Assistant to provide more specialized answers compared to direct language model outputs without repository context.
## Production Deployment Architecture
The production system uses a decoupled frontend-backend architecture, with services and products connected through HTTP interfaces. This supports multiple product requirements including the GitHub App, third-party user portals, and the PeterCat official website.
GitHub webhooks serve as the trigger mechanism for assistant activation. When events like code submissions or issue creation occur, GitHub notifies the system through webhooks with relevant information. The system then determines which specialized agent needs to be activated based on the event type.
For deployment integration, a GitHub App was developed to serve as the connection bridge between user repositories and PeterCat assistants. Users authorize the GitHub App like any other application, and PeterCat automatically retrieves their repository list. After creating a GitHub Assistant, users can select a target repository from the website, and the AI assistant is integrated to actively participate in issue discussions and PR reviews.
## Assistant Creation Experience
The system includes an innovative "assistant for creating assistants"—a specialized wizard for customizing GitHub Assistants. Users can chat directly with this meta-assistant, describing what kind of assistant they want. The preview window updates in real-time as users speak, providing immediate visual feedback on the assistant being created. Default configurations including avatar, name, and personality traits (defined through system prompts) are automatically generated from the repository URL.
## Technology Stack
The production system leverages several key technologies: LangChain for agent orchestration and text splitting, OpenAI for language model services and embeddings, Supabase for vector storage and similarity search, FastAPI for the backend API framework, and AWS Lambda for asynchronous task processing during repository vectorization.
## Future Development Directions
The project roadmap includes breaking down repository isolation through multi-agent architectures for knowledge sharing, enhancing code comprehension for complex business logic and cross-file context, developing VS Code plugins for IDE integration, and empowering users with greater control over knowledge base management and optimization.
## Assessment and Considerations
While the case study presents an ambitious and technically sound approach to repository-aware AI assistants, several aspects warrant consideration. The reliance on RAG with Markdown files and closed issues means the system's effectiveness depends heavily on the quality of repository documentation. Repositories with sparse documentation may not benefit as significantly. Additionally, the filtering of non-Markdown files means that actual code understanding is limited compared to documentation-based knowledge.
The claim of 178 open-source project adoptions within three months is notable, though the case study doesn't provide detailed metrics on user satisfaction or accuracy of responses. The code review capabilities, while innovative, may face challenges with complex refactoring or architectural changes that span multiple files with intricate dependencies.
Overall, PeterCat represents a practical demonstration of combining LLM agents with RAG for domain-specific applications, with thoughtful attention to prompt engineering, agent specialization, and production deployment considerations.