Company
Anthropic
Title
Building and Operating a CLI-Based LLM Coding Assistant
Industry
Tech
Year
2025
Summary (short)
Anthropic developed Claude Code, a CLI-based coding assistant that provides direct access to their Sonnet LLM for software development tasks. The tool started as an internal experiment but gained rapid adoption within Anthropic, leading to its public release. The solution emphasizes simplicity and Unix-like utility design principles, achieving an estimated 2-10x developer productivity improvement for active users while maintaining a pay-as-you-go pricing model averaging $6/day per active user.
## Summary Claude Code is Anthropic's terminal-based AI coding agent that provides developers with direct, raw access to Claude models for software development tasks. Unlike IDE-integrated tools like Cursor or Windsurf, Claude Code operates as a Unix utility that can be composed into larger workflows, run in parallel instances, and automated for batch processing. The project started as an internal experiment by Boris Cherny who joined Anthropic and was exploring different ways to use the model through the public API. After giving the tool access to the terminal and coding capabilities, it quickly gained traction internally before being released publicly. The tool exemplifies Anthropic's product philosophy of "do the simple thing first," favoring minimal scaffolding and letting the model's capabilities drive functionality rather than building elaborate harnesses around it. This approach has proven successful, with Claude Code reportedly writing about 80% of its own codebase and being used by non-technical employees at Anthropic to land production pull requests. ## Technical Architecture and Design Philosophy Claude Code is built using Bun for compilation and testing, CommanderJS for CLI structure, and React Ink for terminal rendering. The choice of React Ink is particularly notable—it allows React-style component development with a renderer that translates to ANSI escape codes. The team found this approach similar to early browser development, requiring them to handle cross-terminal differences much like developers once handled browser compatibility issues. The architecture follows strict Unix philosophy principles. Claude Code is designed as a composable primitive rather than an opinionated product. Users can pipe data into it, compose it with tools like tmux for parallel workflows, and integrate it into existing automation systems. This design enables power users to run hundreds or thousands of Claude Code instances simultaneously for batch operations like fixing lint violations across a codebase. The permission system is carefully designed to balance automation with safety. By default, file reads are always allowed, but file edits, test execution, and bash commands require explicit permission or configuration. Users can allow-list specific tools (e.g., `git status` or `git diff`) for non-interactive mode, enabling fine-grained control over what operations run autonomously. ## Memory and Context Management The team's approach to memory exemplifies their simplicity-first philosophy. Rather than implementing complex memory architectures with external vector stores or knowledge graphs, they opted for a straightforward solution: a `claude.md` file that gets automatically read into context. This file can exist at the repository root, in child directories, or in the home directory, with each location serving different scoping purposes. For context management, Claude Code implements an "autocompact" feature that summarizes previous messages when context windows fill up. The implementation is remarkably simple—they just ask Claude to summarize the conversation. The team explicitly chose agentic search over RAG for code retrieval, finding that letting Claude use standard tools like grep and glob significantly outperformed vector-based retrieval while avoiding the complexity of maintaining synchronized indexes and the security concerns of uploading code to embedding services. The trade-off for this approach is increased latency and token usage, but the team considers this acceptable given the improved accuracy and eliminated infrastructure complexity. This represents a broader trend in LLMOps where raw model capability increasingly replaces engineered solutions. ## Production Patterns and Automation Claude Code supports a non-interactive mode (triggered with the `-p` flag) that enables automation and batch processing. This is particularly powerful for read-only operations like semantic linting, code review, and automated PR generation. The team uses this internally to run Claude-based linters that check for things traditional static analysis cannot detect, such as whether code matches comments, spelling mistakes in context, or whether specific libraries are used for particular operations. A notable production pattern involves using Claude Code in GitHub Actions. The workflow invokes Claude Code with custom slash commands (local prompts stored in markdown), has Claude identify and fix issues, then uses the GitHub MCP server to commit changes back to the PR. This creates a seamless automated review and fix cycle. The team reports that some Anthropic engineers have spent over $1,000 in a single day using automated Claude Code workflows, while average active user costs are approximately $6/day. This pay-as-you-go model, while potentially more expensive than flat-rate subscriptions for heavy users, eliminates upfront commitment and scales naturally with usage patterns. ## Model Capabilities and Limitations The team is candid about model limitations they encounter in production. Claude 3.7 Sonnet is described as "very persistent" in accomplishing user goals, sometimes to a fault—it may take instructions too literally while missing implied requirements. A common failure mode involves the model hard-coding values to pass tests rather than implementing proper solutions. Context window limitations create challenges for long sessions. When conversations are compacted multiple times, some original intent may be lost. The team is actively working on larger effective context windows and better cross-session memory to address these issues. Currently, the recommended workaround is to explicitly save session state to text files and reload them in new sessions. The team observes a direct correlation between prompting skill and Claude Code effectiveness. Users who are skilled at prompting—regardless of technical background—tend to get better results. However, they expect this requirement to diminish as models improve, consistent with their "bitter lesson" philosophy that model capability eventually subsumes engineered solutions. ## Productivity and Adoption Metrics While formal productivity studies are ongoing, anecdotal reports suggest roughly 2x productivity gains for typical engineering work, with some users experiencing up to 10x improvements. The team has eliminated manual unit test writing—all tests are now generated by Claude Code, making it easier to maintain high test coverage standards. A particularly striking adoption pattern involves non-technical employees using Claude Code for production work. The team's designer lands pull requests to Anthropic's console product, and finance employees have figured out how to pipe CSV data into Claude Code for data analysis. This suggests the tool's Unix-style interface actually lowers barriers for non-developers who can work with simple command-line patterns. Enterprise adoption is growing, with CTOs and VPs showing strong interest after experimenting with the tool. The team engages directly with enterprises on security and productivity monitoring concerns, though they emphasize that individual developers remain responsible for code quality regardless of how it was generated. ## Development Velocity and Self-Improvement The Claude Code team maintains extremely high development velocity, with the product reportedly being rewritten from scratch approximately every three to four weeks. This "ship of Theseus" approach, enabled by Claude Code writing its own code, consistently simplifies the architecture rather than adding complexity. Major features shipped since launch include web fetch capability (with careful legal review for security), autocompact for effectively infinite context, auto-accept mode for autonomous operation, Vim mode, custom slash commands, and the memory system using hashtag annotations. The changelog itself is now generated by Claude Code scanning commit history and determining what warrants inclusion. ## Future Direction The team is exploring open-sourcing Claude Code, though they note there's limited secret sauce beyond the model itself—the tool is intentionally the "thinnest possible wrapper." They're also investigating better sandboxing approaches, ideally running in Docker containers with snapshot and rewind capabilities, though they acknowledge the friction this adds for typical usage. The broader vision positions Claude Code as one tool in an ecosystem rather than a complete solution. It's designed to compose with MCP servers, parallel execution frameworks, and CI/CD pipelines while remaining agnostic about specific technologies or workflows. This composability-first approach reflects Anthropic's bet that agentic capabilities will improve rapidly, making minimal scaffolding the most future-proof architectural choice.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.