eBay: Multi-Track Approach to Developer Productivity Using LLMs

eBay’s journey into implementing LLMs for developer productivity represents a comprehensive and pragmatic approach to adopting AI technologies in a large-scale enterprise environment. The company explored three distinct but complementary tracks for improving developer productivity through AI, offering valuable insights into the real-world challenges and benefits of deploying LLMs in production.

The case study is particularly noteworthy for its measured approach to evaluation and deployment, using both quantitative and qualitative metrics to assess the impact of these technologies. Instead of relying on a single solution, eBay recognized that different aspects of developer productivity could be better served by different approaches to LLM deployment.

Track 1: GitHub Copilot Implementation

The first track involved the enterprise-wide deployment of GitHub Copilot, preceded by a carefully designed A/B test experiment with 300 developers. The evaluation methodology was robust, involving:

A control group setup with similar assignments and abilities
A two-week ramp-up period
Multiple measurement metrics including code acceptance rate, accuracy, and PR metrics
Code quality monitoring through Sonar

The results showed significant improvements:

27% code acceptance rate (via Copilot telemetry)
70% accuracy for generated documents
60% accuracy for generated code
17% decrease in pull request creation to merge time
12% decrease in Lead Time for Change

However, eBay was also transparent about the limitations, particularly noting Copilot’s context window constraints when dealing with their massive codebase. This highlights an important consideration for large enterprises implementing similar solutions.

Track 2: Custom LLM Development (eBayCoder)

The second track demonstrates a more specialized approach to handling company-specific code requirements. eBay created eBayCoder by fine-tuning Code Llama 13B on their internal codebase and documentation. This approach addressed several limitations of commercial solutions:

Better handling of company-specific libraries and frameworks
Improved context awareness for large-scale codebases
Enhanced ability to handle software upkeep and migration tasks
Reduced code duplication through better awareness of internal services

The implementation shows careful consideration of model selection (Code Llama 13B) and training strategy (post-training and fine-tuning on internal data). This represents a significant investment in MLOps infrastructure to support model training and deployment.

Track 3: Internal Knowledge Base System

The third track focused on creating an intelligent knowledge retrieval system using RAG (Retrieval Augmented Generation). This system demonstrates several sophisticated LLMOps practices:

Automated, recurring content ingestion from multiple sources (GitHub Markdowns, Google Docs, Jira, Slack, Wikis)
Vector embedding creation and storage in a vector database
Similarity-based retrieval using cosine similarity
Integration with both commercial and open-source LLMs
Implementation of RLHF (Reinforcement Learning from Human Feedback) for continuous improvement

The system includes important production-ready features:

Automated content updates
User feedback collection interface
Clear fallback mechanisms when answers aren’t available
Integration with multiple data sources

MLOps and Production Considerations

The case study reveals several important MLOps considerations:

Multi-model orchestration: Managing multiple LLM solutions in production
Evaluation frameworks: Using both quantitative and qualitative metrics
Feedback loops: Implementing RLHF for continuous improvement
Data pipeline automation: Regular updates to knowledge bases
Security and compliance: Handling sensitive internal documentation
Scale considerations: Dealing with massive codebases and documentation

Monitoring and Evaluation

eBay implemented comprehensive monitoring and evaluation strategies:

Developer surveys for qualitative feedback
Code quality metrics through Sonar
PR and deployment metrics
Usage tracking for internal tools
Accuracy measurements for generated content
User feedback collection and integration

Future Considerations

The case study acknowledges that they are at the beginning of an exponential curve in terms of productivity gains. They maintain a pragmatic view of the technology while recognizing its transformative potential. The implementation of RLHF and continuous improvement mechanisms suggests a long-term commitment to evolving these systems.

This case study provides valuable insights into how large enterprises can systematically approach LLM deployment, balancing commercial solutions with custom development while maintaining a focus on practical productivity improvements. The multi-track approach demonstrates a sophisticated understanding of how different LLM implementations can complement each other in a production environment.

Multi-Track Approach to Developer Productivity Using LLMs

Industry

Technologies