Cursor built a modern AI-enhanced code editor by forking VS Code and incorporating advanced LLM capabilities. Their approach focused on creating a more responsive and predictive coding environment that goes beyond simple autocompletion, using techniques like mixture of experts (MoE) models, speculative decoding, and sophisticated caching strategies. The editor aims to eliminate low-entropy coding actions and predict developers' next actions, while maintaining high performance and low latency.
Cursor is an AI-powered code editor built as a fork of VS Code, designed to provide significantly enhanced AI-assisted programming capabilities. The founding team—Michael Truell, Sualeh Asif, Arvid Lunnemark, and Aman Sanger—recognized early on that the scaling laws emerging from OpenAI’s research in 2020 would lead to increasingly capable models, and that programming environments would need to fundamentally evolve to take advantage of these capabilities. Rather than building extensions on top of existing editors (which would limit their control over the user experience), they chose to fork VS Code and build a comprehensive AI-native editing experience.
The team’s journey began with observing GitHub Copilot’s success in 2021 as the first major LLM consumer product, but they felt frustrated that despite models getting significantly better (particularly with GPT-4 access in late 2022), the coding experience wasn’t evolving to match. This motivated them to build Cursor with a philosophy of rapid iteration and deep integration between model capabilities and user experience.
A core principle of Cursor’s approach is that they don’t rely solely on frontier models. Instead, they train and deploy an ensemble of custom models specialized for specific tasks, combined with frontier models for reasoning-intensive operations.
The Cursor Tab feature represents one of their most sophisticated custom models. Unlike traditional autocomplete which predicts characters after the cursor, Cursor Tab aims to predict the next complete edit the user will make, including:
The model is designed to eliminate “low entropy actions”—keystrokes that are highly predictable given the current context. The team describes this as making programming feel like the AI is “reading your mind” for the zero-entropy bits of your work.
Training this model involves:
A critical discovery was that while frontier models are good at sketching out code changes and generating rough plans, they struggle with the seemingly simple task of actually applying those changes to existing code. Tasks like counting line numbers accurately in large files trip up even the best models like Sonnet and o1.
The Apply model is specifically trained to take a rough code sketch from a frontier model and accurately implement it as a diff to the existing file. This separation of concerns allows them to use fewer tokens with the most intelligent models (reducing latency and cost) while delegating the implementation details to specialized models.
Speed is considered fundamental to the product experience—“fast is fun” as the team puts it. Several sophisticated techniques are employed to achieve low latency:
The KV (Key-Value) cache is central to their inference optimization strategy. When processing prompts with transformers, reusing computed keys and values from previous tokens avoids redundant forward passes through the model. Cursor implements:
This is a variant of speculative decoding tailored for code editing. Traditional speculative decoding uses a small draft model to predict tokens that a larger model then verifies. For code edits, Cursor leverages a strong prior: most of the output will be the same as the existing code.
The technique works by:
The team discusses various attention optimizations:
These techniques are particularly important for handling large batch sizes and long contexts without degrading generation speed, as the bottleneck shifts from compute to memory bandwidth.
Cursor builds a semantic index of entire codebases to enable context-aware assistance. The architecture involves:
The team notes that embedding is the cost bottleneck, not storage, which influenced their caching strategy.
Cursor developed an internal system called “Preempt” inspired by React’s declarative approach. Prompts are written using JSX-like components where:
This approach separates the raw data from prompt rendering, making it easier to debug, iterate on prompt templates, and evaluate changes across historical data.
The team provides candid assessment of different frontier models:
They express skepticism about public benchmarks due to:
Instead, the team relies heavily on “vibe checks”—qualitative human evaluation of model outputs, which they acknowledge is imperfect but often more reliable than benchmarks.
The team has iterated extensively on how to display AI-suggested changes:
They envision needing 4-5 different diff interfaces optimized for different contexts (autocomplete vs. large block review vs. multi-file changes). Future ideas include:
As models propose larger and larger changes, human verification becomes increasingly burdensome. The team is actively researching ways to assist with this, including using AI to prioritize which parts of a diff actually need careful review.
Running Cursor at scale on AWS has presented numerous challenges:
The team emphasizes that predicting where systems will break under scale is extremely difficult—there’s always something unexpected.
The team is excited about agents but notes they’re “not yet super useful for many things.” They envision:
The team discusses the potential of test-time compute scaling:
The team outlines three categories of synthetic data:
They’re bullish on distillation as a way to create capable, fast models for specific tasks without hitting the data wall.
The team strongly believes in keeping humans in the driver’s seat rather than moving to pure chatbot-style interfaces. Their reasoning:
They envision a future where programmers can fluidly move up and down abstraction levels—editing pseudocode that gets compiled to real code, or diving into implementation details when needed—while maintaining control over all decisions.
Codeium's journey in building their AI-powered development tools showcases how investing early in enterprise-ready infrastructure, including containerization, security, and comprehensive deployment options, enabled them to scale from individual developers to large enterprise customers. Their "go slow to go fast" approach in building proprietary infrastructure for code completion, retrieval, and agent-based development culminated in Windsurf IDE, demonstrating how thoughtful early architectural decisions can create a more robust foundation for AI tools in production.
Cursor, founded by MIT graduates, developed an AI-powered code editor that goes beyond simple code completion to reimagine how developers interact with AI while coding. By focusing on innovative features like instructed edits and codebase indexing, along with developing custom models for specific tasks, they achieved rapid growth to $100M in revenue. Their success demonstrates how combining frontier LLMs with custom-trained models and careful UX design can transform developer productivity.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.