The Browser Company: Building an AI-Native Browser with Integrated LLM Tools and Evaluation Systems

Company

The Browser Company

Title

Building an AI-Native Browser with Integrated LLM Tools and Evaluation Systems

Industry

Tech

Link

https://www.youtube.com/watch?v=o4scJaQgnFA

Year

2025

Summary (short)

The Browser Company transitioned from their Arc browser to building Dia, an AI-native browser, requiring a fundamental shift in how they approached product development and LLMOps. The company invested heavily in tooling for rapid prototyping, evaluation systems, and automated prompt optimization using techniques like Jeba (a sample-efficient prompt optimization method). They created a "model behavior" discipline to define and ship desired LLM behaviors, treating it as a craft analogous to product design. Additionally, they built security considerations into the product design from the ground up, particularly addressing prompt injection vulnerabilities through user confirmation workflows. The result was a browser that provides an AI assistant alongside users, personalizing experiences and helping with tasks, while enabling their entire company—from CEO to strategy team members—to iterate on AI features.

Tags

## Overview The Browser Company, founded in 2019 with a mission to rethink how people use the internet, provides a compelling case study in building AI-native products and the LLMOps practices required to support them. The company initially shipped Arc in 2022, a browser designed to be more personal and organized than traditional browsers. However, when they gained access to LLMs like GPT models in 2022, they began exploring AI capabilities that eventually led to a complete reimagining of their product. In early 2024, they announced "Act 2," sharing their thesis that AI would fundamentally transform how people use the internet. This led to the development and launch of Dia earlier in 2025, an AI-native browser built from the ground up with AI, speed, and security in mind. Dia provides an assistant that works alongside users, learns about them, personalizes experiences, and helps manage tabs and applications to improve productivity. The presentation by Samir, the head of AI engineering, offers valuable insights into the LLMOps challenges and solutions encountered during this transition. What makes this case study particularly interesting is that it represents not just a product evolution but a company-wide transformation in how they build, hire, train, and operate. The company recognized that building an AI-native product required fundamental changes to their development processes, team structures, and security considerations. ## Tooling for Rapid Iteration and Prototyping From the beginning, The Browser Company embraced a philosophy that they could only win by building tools, processes, platforms, and mindsets that enabled them to iterate, build, ship, and learn faster than competitors. This principle became even more critical when building AI-native products. The company made strategic investments in four key tooling areas: prototyping for AI product features, building and running evaluations, collecting data for training and evaluation, and automation for continuous improvement (what they call "hill climbing"). Their initial approach to AI feature development involved a rudimentary prompt editor that only existed in developer builds. This proved problematic in several ways. First, it limited access to only engineers, creating a bottleneck in ideation. Second, it resulted in slow iteration speeds. Third, and perhaps most critically, it didn't provide access to personal context—which is essential for evaluating whether an AI product feels right and delivers real utility. The company evolved their approach dramatically by building all their AI development tools directly into their product—specifically into the version of Dia that they use internally every day. This means that prompts, tools, context, models, and every parameter are accessible and editable within the product itself. This architectural decision had transformative effects on their development velocity. It enabled them to "10x their speed of ideating, iterating and refining" products while simultaneously widening the pool of people who could contribute to product development. Now everyone from the CEO to the newest hire can ideate on new features or refine existing ones, all with their full personal context intact. This democratization of AI product development extended to all their major product protocols. They built tools for optimizing their memory knowledge graph that all employees use. They created tools for iterating on their computer use mechanism, which allowed them to try "tens of different types of computer use strategies" before settling on one to build into the product. Samir emphasized that this approach not only accelerated development but also made it more fun, enabling creativity across roles. Product managers, designers, customer service representatives, and strategy/operations team members could all try out new ideas tailored to their specific use cases, which aligned perfectly with the company's goal of understanding diverse user needs. ## Automated Prompt Optimization with Jeba Beyond manual tooling, The Browser Company invested in automated mechanisms for continuous improvement of their AI systems. They adopted an approach called Jeba, based on a research paper from earlier in 2025, which provides a sample-efficient way to improve complex LLM systems without requiring reinforcement learning or other fine-tuning techniques. For a small company, this efficiency is crucial since they don't have the resources for extensive model training. Jeba works by seeding the system with a set of initial prompts, executing those prompts across a set of tasks, scoring the results, and then using a selection mechanism (called PA selection) to identify the best-performing prompts. The system then leverages an LLM to reflect on what worked well and what didn't, generating new prompt variations to test in the next iteration. The key innovations in this approach include the reflective prompt mutation technique, the selection process that explores a broader space of possible prompts rather than following a single optimization path, and the ability to tune text (prompts) rather than model weights. Samir provided a modest example showing how Jeba could take a simple prompt and optimize it along whatever metrics and scoring mechanisms they defined, refining it to better meet their quality standards. This automated optimization complements their manual iteration processes and helps them continuously improve their AI features after initial prototyping. ## Development Phases and Process The Browser Company has structured their AI feature development into distinct phases. The first phase focuses on prototyping and ideation, where they've intentionally widened the breadth of ideas at the top of the funnel and lowered the threshold for who can build them and how. They try numerous ideas every day and every week, contributed by people across all roles, and they dogfood these features internally. If an idea demonstrates real utility, solves a genuine problem, and shows a path toward meeting their quality thresholds, it moves to the next phase. This refinement phase involves collecting and refining evaluation datasets to clarify product requirements, then hill-climbing through code improvements, prompting adjustments, and automated techniques like Jeba. They continue to dogfood internally before ultimately shipping to users. Samir emphasized that both phases are equally important. The ideation phase is crucial because new AI advancements emerge every week, unlocking new possibilities in Dia. Their goal is to get "as many at-bats" as possible—trying and exploring numerous ideas to discover what truly works. However, they're also careful not to underestimate the work required to ship these ideas to production as high-quality experiences. This balanced approach helps them move quickly while maintaining quality standards. ## Model Behavior as a Craft and Discipline One of the most interesting organizational innovations at The Browser Company is the formalization of "model behavior" as a distinct discipline and craft. Model behavior is defined as the function that evaluates and ships the desired behaviors of models—essentially turning principles into product requirements, prompts, evaluations, and ultimately shaping the behavior and personality of their LLM products, particularly the Dia assistant. The model behavior discipline encompasses several key areas. First is behavior design, which involves defining the product experience they want to create, including the style, tone, and shape of responses. Second is collecting data for measurement and training purposes. Third is clarifying product requirements through evaluation frameworks. Fourth is model steering, which includes the actual building of the product—selecting models, crafting prompts, defining what goes into the context window, setting parameters, and much more. The process is highly iterative: they build, refine, create evaluations, ship, and then collect more feedback (both internal and external) that feeds back into the next iteration cycle. Samir drew an interesting analogy comparing the evolution of model behavior to the evolution of product design on the internet. Early websites were purely functional—they got the job done but nothing more. Over time, as people tried to achieve more online and technology advanced, product design and craft became more sophisticated and complex. Similarly, early LLM products were functional—prompts went in, outputs came out, with basic evaluations. Now the field has evolved to framing things through agent behaviors: goal-directed reasoning, shaping autonomous tasks, self-correction and learning, and shaping the personality of LLMs themselves. The Browser Company believes they're in the early days of building AI products and that model behavior will continue evolving into an increasingly specialized and prevalent function, even at product companies. Importantly, they've found that the best people for this work might surprise you. Samir shared a favorite story about how their model behavior team was actually formed. Initially, engineers were writing all the prompts. After they built prompt tools that enabled broader participation, a person on their strategy and operations team used these tools over a weekend to rewrite all their prompts. On Monday morning, he shared a Loom video explaining what he did, how, and why, along with the new prompts. Those prompts alone "unlocked a new level of capability and quality and experience" in their product and led directly to the formation of their dedicated model behavior team. This story illustrates their broader point: the people who can best shape AI products and steer models might come from any role in the company, not just engineering. ## AI Security and Prompt Injection Prevention The Browser Company takes AI security seriously, particularly prompt injection attacks, which Samir identified as a critical concern for browsers. A prompt injection is an attack in which a third party can override an LLM's instructions to cause harm, potentially leading to data exfiltration, execution of malicious commands, or bypassing safety rules. Samir provided a concrete example: a user might ask an LLM to summarize a website, but if that website contains a hidden prompt injection in its HTML, the LLM could instead be directed to open a new website with the user's personal information embedded as GET parameters in the URL, effectively exfiltrating that data. For browsers specifically, prompt injections are particularly dangerous because browsers sit at what Samir called "the middle of a lethal trifecta": they have access to private data, they're exposed to untrusted content from the internet, and they have the ability to communicate externally (opening websites, sending emails, scheduling events, etc.). This combination makes browsers an especially high-risk attack surface. The Browser Company explored several technical strategies to prevent prompt injections. One approach is wrapping untrusted context in tags and instructing the LLM to listen to instructions around those tags while ignoring content within them. However, this is "easily escapable and quite trivial" for attackers to bypass. Another approach involves separating data from instructions by assigning operating instructions to a system role and user content to a user role, with randomly generated tags wrapping the user content to ensure the LLM prioritizes instructions over content. While this helps, Samir acknowledged that there are "no guarantees and prompt injections will still happen." Given these limitations, The Browser Company adopted a philosophy of designing the product with prompt injection risks in mind from the ground up. They blend technological approaches with user experience and design to create a cohesive security story. A concrete example is their autofill tool, which leverages an LLM with context, memory, and user details to fill forms on the internet. This feature is extremely powerful but also vulnerable—a prompt injection could extract user data and place it in a form, where it would be out of the user's control. Their solution is to require user confirmation before data is written to forms, showing users the data in plain text so they can read and confirm it. This doesn't prevent prompt injections, but it gives users control, awareness, and trust in what's happening. This design principle carries throughout their product. Scheduling events in Dia includes a similar confirmation step, as does writing emails. By building these confirmation workflows into the user experience, they create a defense-in-depth approach where technical mitigations are complemented by user awareness and control. ## Critical Assessment and Balanced Perspective While The Browser Company's approach to LLMOps demonstrates several innovative practices, it's important to maintain a balanced perspective on their claims and approach. The company presents their story as a success narrative, but several aspects warrant closer examination. Their emphasis on rapid prototyping and democratizing AI development across roles is admirable, but this approach also carries risks. Allowing non-technical team members to iterate on production AI features could lead to quality control challenges, unintended behaviors, or security vulnerabilities if proper guardrails aren't in place. While they mention dogfooding and evaluation processes, the details of how they ensure quality and safety at scale aren't fully elaborated. The Jeba automated optimization technique is presented as a key innovation, but Samir acknowledges it's based on a research paper from 2025 and provides only a "modest example" of its effectiveness. Without more concrete metrics on improvement rates, computational costs, or comparison to other optimization approaches, it's difficult to assess how much value this actually provides beyond traditional prompt engineering and evaluation cycles. The model behavior discipline is an interesting organizational innovation, and the story about a strategy team member rewriting prompts is compelling. However, this also raises questions about the sustainability of having critical product behaviors depend on individuals from non-technical backgrounds without robust testing and evaluation frameworks. The company emphasizes iteration and creativity but provides limited detail on how they prevent regressions or ensure consistency across the product. On security, their approach to prompt injection is pragmatic given current LLM limitations, but the reliance on user confirmation for sensitive actions is essentially pushing security responsibility to end users. While this provides transparency, it also creates friction and depends on users being vigilant and understanding the implications of confirmation dialogs—something that doesn't always work well in practice, as research on security warnings has shown. The company's claim that they're building tools and processes to "iterate, build, ship, and learn faster than everyone else" is ambitious but difficult to verify without competitive benchmarking or specific velocity metrics. Similarly, claims about "10x" improvements in speed are impressive but would benefit from more concrete definition and measurement. That said, The Browser Company's holistic approach to LLMOps—viewing it as a company-wide transformation rather than just a technical challenge—is genuinely valuable. Their recognition that AI-native products require changes in how they hire, train, communicate, and collaborate shows organizational maturity. The integration of AI development tools directly into their product, enabling work with full personal context, is a clever solution to the cold-start problem many AI development environments face. ## Conclusion The Browser Company's journey from Arc to Dia represents a comprehensive case study in building AI-native products and the LLMOps practices required to support them. Their investments in tooling for rapid prototyping, automated optimization through techniques like Jeba, formalization of model behavior as a discipline, and security-first product design demonstrate a thoughtful approach to the unique challenges of LLM-powered products. Key takeaways include the value of democratizing AI development beyond engineering teams, the importance of working with full personal context when evaluating AI products, the potential of automated prompt optimization for resource-constrained teams, and the necessity of designing user experiences that account for the current limitations of LLM security. While some claims would benefit from more rigorous evidence and metrics, the overall approach represents a mature understanding of what it takes to build, operate, and continuously improve LLM-powered products at scale. The company's emphasis on treating the transition to AI as a company-wide transformation rather than just a product feature addition may be their most important insight for other organizations embarking on similar journeys.

Start deploying reproducible AI workflows today