Company
Val Town
Title
Evolution of Code Assistant Integration in a Cloud Development Platform
Industry
Tech
Year
2025
Summary (short)
Val Town's journey in implementing and evolving code assistance features showcases the challenges and opportunities in productionizing LLMs for code generation. Through iterative improvements and fast-following industry innovations, they progressed from basic ChatGPT integration to sophisticated features including error detection, deployment automation, and multi-file code generation, while addressing key challenges like generation speed and accuracy.
## Overview Val Town is a code hosting service that has been on a multi-year journey to integrate cutting-edge LLM-powered code generation into their platform. This case study is particularly valuable because it offers an honest, retrospective account of their "fast-following" strategy—deliberately copying and adapting innovations from industry leaders while occasionally contributing their own improvements. The narrative provides insight into the practical challenges of building and maintaining LLM-powered development tools in a rapidly evolving landscape. The company launched in 2022, and since then has navigated through multiple paradigm shifts in AI code assistance: from GitHub Copilot-style completions, through ChatGPT-era chat interfaces, to the current generation of agentic code assistants exemplified by Cursor, Windsurf, and Bolt. Their primary product, "Townie," has evolved through multiple versions as they adapted to these shifts. ## Technical Evolution and Architecture Decisions ### Phase 1: Autocomplete Integration Val Town's initial foray into LLM-powered features was autocomplete functionality similar to GitHub Copilot. Their first implementation used Asad Memon's open-source `codemirror-copilot` library, which essentially prompted ChatGPT to "cosplay" as an autocomplete service. This approach had significant limitations: it was slow, occasionally the model would break character and produce unexpected outputs, and it lacked the accuracy of purpose-built completion models. The technical insight here is important for LLMOps practitioners: using a general-purpose chat model for a specialized task like code completion introduces latency and reliability issues that purpose-trained models avoid. The solution was migrating to Codeium in April 2024, which offered a properly trained "Fill in the Middle" model with documented APIs. They open-sourced their `codemirror-codeium` integration component, demonstrating a commitment to ecosystem contribution even while primarily consuming others' innovations. ### Phase 2: Chat Interface and Tool Use Experiments The first version of Townie was a straightforward ChatGPT-powered chat interface with a pre-filled system prompt and one-click code saving functionality. However, this proved inadequate because the feedback loop was poor—users needed iterative conversations to refine code, but the interface wasn't optimized for this workflow. Their subsequent experiment with OpenAI's function calling (now "tool use") provides a cautionary tale. Despite investing in cleaning up their OpenAPI spec and rebuilding Townie around structured function calling, the results were disappointing. The LLM would hallucinate functions that didn't exist even with strict function definitions provided. While function calling has improved with Structured Outputs, Val Town concluded that the interface was "too generic"—capable of doing many things poorly rather than specific things well. This is a crucial LLMOps lesson: the promise of giving an LLM your API specification and expecting intelligent orchestration often underdelivers. The magic comes from carefully constraining what actions an agent can take and how it chains them together, rather than providing maximum flexibility. ### Phase 3: Claude Artifacts-Inspired Architecture The launch of Claude 3.5 Sonnet and Claude Artifacts in mid-2024 represented a turning point. Val Town observed that Claude 3.5 Sonnet was "dramatically better at generating code than anything we'd seen before" and that the Artifacts paradigm solved the tight feedback loop problem they had struggled with. After about a month of prototyping, they launched the current version of Townie in August 2024. This version can generate fullstack applications—frontend, backend, and database—and deploy them in minutes. The architecture includes hosted runtime, persistent data storage (via @std/sqlite), and included LLM API access (@std/openai) so users can create AI-powered applications without managing their own API keys. ## Technical Innovations and Contributions ### Diff-Based Code Generation One of Val Town's notable contributions to the space is their work on diff-based code generation, inspired by Aider. The motivation is clear: regenerating entire files for every iteration is slow and expensive. By having the LLM produce diffs instead, iteration cycles can be dramatically faster. Their system prompt (which they keep publicly visible) includes specific instructions for handling diff versus full-code generation based on user requests. When users explicitly request diff format, Townie generates valid unified diffs based on existing code. However, this feature is currently off by default because reliability wasn't sufficient—the model would sometimes produce malformed diffs or misapply changes. The team expresses hope that Anthropic's rumored "fast-edit mode" or OpenAI's Predicted Outputs might solve this problem more robustly. They also point to faster inference hardware (Groq, Cerebras) and more efficient models (citing DeepSeek's near-Sonnet-level model trained for only $6M) as potential paths to making the iteration speed problem less critical. ### Automatic Error Detection and Remediation Val Town's potentially-novel contribution is their automatic error detection system. The implementation has two components: - **Server-side errors**: Townie polls the Val Town backend for 500 errors in user logs - **Client-side errors**: Generated applications import a client-side library that pushes errors up to the parent window When errors are detected, Townie proactively asks users if they'd like it to attempt a fix. While the team modestly notes this isn't particularly novel in concept, they suggest it may have influenced similar features in Anthropic's tools and Bolt. ## Production Considerations and Future Directions ### Model Selection and Cost Trade-offs The case study reveals ongoing tension between model capability and speed/cost. Claude 3.5 Sonnet provides the best code generation quality, but inference is slow and expensive for iterative workflows. The team has explored alternatives like Cerebras-hosted models for near-instant feedback loops, and they express excitement about DeepSeek's cost-efficient training approaches potentially enabling Sonnet-level quality at much lower costs. ### Agentic Capabilities Looking forward, Val Town envisions more autonomous behavior inspired by tools like Windsurf and Devin. Current ideas include: - Automatic multi-attempt error fixing without human intervention - Parallel exploration across different solution branches - Web browser integration for the LLM to test its own generated applications - Automatic test generation to prevent regressions during iteration - Long-running autonomous sessions (potentially hours) for complex projects They also mention interest in giving Townie access to search capabilities—across public vals, npm packages, and the broader internet—to find relevant code, documentation, and resources. ### Integration Strategy An interesting strategic tension emerges in the case study: should Val Town compete with dedicated AI editors like Cursor and Windsurf, or integrate with them? Their current approach is both—continuing to develop Townie while also improving their local development experience and API so external tools can "deploy to Val Town" similar to Netlify integrations. ## Honest Assessment and Limitations The article is refreshingly candid about limitations and failed experiments. The tool-use version of Townie was explicitly called "a disappointment." Diff generation doesn't work reliably enough to be enabled by default. The first ChatGPT-based Townie "didn't get much use" because the feedback loop was poor. This transparency is valuable for LLMOps practitioners because it illustrates that even simple-seeming features often have subtle reliability challenges that only emerge in production. The team's willingness to keep their system prompt open and blog about technical choices suggests a collaborative rather than secretive approach to competitive development in this space. ## Conclusions for LLMOps Practice Val Town's experience offers several lessons for teams building LLM-powered developer tools: - Purpose-built models (like Codeium for completion) outperform prompted general models for specialized tasks - Generic tool-use with full API access disappoints compared to carefully constrained agentic actions - Diff-based generation is theoretically compelling but practically unreliable with current models - Automatic error detection can close feedback loops and improve user experience - The space moves fast enough that "fast-following" competitors is a viable strategy for smaller teams - Transparency about system prompts and approaches fosters ecosystem collaboration The case study demonstrates that building production LLM features is as much about iteration and learning from failures as it is about initial implementation, and that keeping pace with rapid model improvements requires continuous adaptation of product architecture and prompting strategies.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.