## Overview
Entelligence is building AI engineering agents designed to reduce operational overhead for large engineering teams. The company was founded by Ashia, who previously worked at Uber Freight (a 500-engineer organization) and experienced firsthand the challenges of coordinating large engineering teams—including performance reviews, status updates, code reviews, knowledge sharing, and team syncs. The core insight driving Entelligence is that while AI tools like Cursor have accelerated code generation, they haven't addressed the team coordination challenges that compound as organizations grow.
The interview, conducted by Jordi from "Agents at Work," provides valuable insights into how Entelligence operationalizes LLMs in production for software development workflows. This case study is particularly relevant given the rise of "vibe coding"—where engineers use AI to rapidly generate code without necessarily understanding every line—which creates new challenges around code quality, documentation, and knowledge management.
## Problem Statement
As engineering teams scale, several operational challenges emerge:
- **Code review burden**: With AI-generated code, the volume of code to review has increased while human attention to each PR has decreased. Engineers who spent one minute generating code are unlikely to spend 30 minutes reviewing it carefully.
- **Knowledge fragmentation**: When AI generates code, no single human is the definitive author who can explain how things work. This creates knowledge silos and makes onboarding difficult.
- **Stale documentation**: Traditional documentation in tools like Confluence quickly becomes outdated as rapid development continues. One enterprise customer told Entelligence that deleting 95% of their Confluence docs would go unnoticed by most of the team.
- **Meaningless metrics**: Traditional productivity metrics like lines of code or PR counts become even more meaningless when AI can generate vast amounts of code instantly.
- **Cross-team coordination**: Engineers in large organizations can spend entire days just answering support questions from PMs, designers, and other stakeholders.
## Technical Architecture and LLMOps Implementation
### Multi-Model Strategy and Evaluation
Entelligence takes model selection seriously and has developed an open-source PR review evaluation benchmark to compare different LLMs for code review tasks. When new models are released, they run comparisons—for example, finding that Deepseek outperformed Claude 3.5 for code reviews by a significant margin. This led to incorporating Deepseek into their platform.
The company is launching a human-in-the-loop PR evaluation leaderboard, similar to Berkeley's LLM leaderboard but specifically for code review. Users can compare models like GPT-4o, Claude, or Deepseek on real PRs without knowing which model generated which review, then vote on which comments are more relevant. The goal is to gather thousands of reviews to converge on optimal model selection.
### Context Retrieval and RAG Architecture
The foundation of Entelligence's system is what they describe as "elaborate agentic hybrid RAG search." They spent their first few months building universal search across code, documentation, and issues. This context retrieval powers several features:
- **Code review**: When reviewing a PR, the system searches the codebase for additional context, examines related PRs for potential conflicts, and pulls in relevant documentation.
- **Similar PRs detection**: The system actively searches for other open PRs that might conflict with current changes—for example, detecting when two engineers are creating database migrations with the same number, or when changes in one repository need to be synchronized with another.
- **Knowledge querying**: Non-engineers (PMs, designers, support teams) can ask questions about the engineering system through Slack integration without having to interrupt engineers.
### Continuous Learning and Feedback Loops
One of the most sophisticated aspects of Entelligence's LLMOps implementation is their approach to learning from team-specific feedback. They discovered that the same product given to two different companies might see 70% comment acceptance at one and 20% at another—not because of product quality, but because engineering cultures differ dramatically.
Their solution involves tracking which comments engineers accept versus reject over time, then using this historical data to calibrate future reviews. This isn't simple prompt engineering but rather a comprehensive approach to giving the model context about team preferences. After roughly two weeks of use, the system learns a team's style and priorities.
A key challenge they addressed is that LLMs are inherently "pedantic"—if asked to review code, they'll find 30-50 issues with almost any PR. However, humans don't need or want perfection on every line. Entelligence built orchestration to force the model to eventually say "looks good to me" rather than endlessly finding issues. This required multiple layers of reflection after initial generation.
### Integration Points and Deployment
Entelligence deploys across multiple touchpoints:
- **IDE integration**: Real-time code review within Cursor, VS Code, and other editors
- **Git platform integration**: Reviews appear directly in GitHub and GitLab PR interfaces
- **Slack integration**: Query the engineering knowledge base through chat
- **Web dashboard**: Managers can see team-level views of progress, and everyone can access and edit documentation
- **MCP support**: Full integration with the Model Context Protocol for broader tooling compatibility
### Documentation Generation and Maintenance
The documentation agent represents a significant production AI use case. It:
- Analyzes the codebase to generate initial documentation
- Monitors every PR and updates docs accordingly—the "dream of having an engineer who documents everything they write"
- Exports to multiple formats (Markdown, Google Docs, Notion, Confluence)
- Maintains a notion-style editor for human editing and section additions
- Keeps docs continuously updated as features are "vibe coded" rapidly
### Sandboxed Execution
For certain validation tasks, Entelligence runs mini sandboxed environments to verify that lint passes and that tests execute properly. However, they acknowledge limitations—full latency testing would require dockerized containers with access to the complete production environment and external services, which is beyond current scope.
### Language Support
The system has customized support for the top 20 programming languages (Python, Go, Java, C, etc.). For less common languages, it falls back to baseline LLM capabilities without specialized optimization.
## Production Challenges and Solutions
### Balancing Thoroughness with Usability
A recurring theme is managing the tension between comprehensive AI review and practical usability. Models naturally want to flag every possible issue, but this creates noise that engineers ignore. Entelligence's solution involves multiple layers of filtering and learning from team behavior.
### Tone Management
An unexpected production requirement emerged around emotional intelligence. One engineering leader specifically requested that the system include positive feedback, not just criticism. Even automated systems benefit from making users feel good about their work before pointing out issues. The platform includes "sprint assessments" with badges and rewards highlighting each engineer's main contributions.
### Cross-Vendor Compatibility
For enterprise customers with existing documentation systems, Entelligence had to build export capabilities to multiple platforms. Rather than forcing customers to use their editor, they "hydrate" documentation wherever customers prefer to host it while maintaining the source of truth in their system.
## Adoption Patterns
Interestingly, adoption is not strongest among "elite" engineering organizations. Engineers who consider themselves cream-of-the-crop tend to be more resistant to AI feedback. The strongest adoption comes from engineering teams focused on delivering business value who want operational overhead removed. Enterprise sales often proceed team-by-team rather than organization-wide.
The platform serves multiple personas:
- **Engineers**: Get code reviews, documentation, and context search
- **Engineering managers**: Access team analytics and performance insights
- **PMs and designers**: Query engineering systems without interrupting engineers
- **Support teams**: Understand engineering changes and features
## Open Source and Community
Entelligence partners with open source projects, offering their suite of tools for code reviews, contributor onboarding, codebase change tracking, and contribution analytics. This serves as both community support and a pathway to adoption.
## Metrics and Evaluation
The company is moving beyond traditional engineering metrics toward impact-based measurement. Their PR review analysis generates metrics on features built, impact of features, and complexity of changes—attempting to measure meaningful contribution rather than code volume.
## Limitations Acknowledged
The interview candidly discusses current limitations:
- No production metric integration (latency, performance data from running systems)
- Custom language support limited to top 20 languages
- Full environment reproduction for testing not currently feasible
## Dogfooding
Entelligence uses their own product internally, which has helped identify issues like overly permissive initial reviews and the need for the feedback learning loop. The founder notes they've iterated since day one when the system "would not catch anything."