Company
PulseMCP
Title
Building and Pricing a Commercial MCP Server for Documentation Search
Industry
Tech
Year
2025
Summary (short)
Ref, featured on PulseMCP, represents one of the first standalone paid Model Context Protocol (MCP) servers designed specifically for AI coding agents to search documentation with high precision. The company faced the unique challenge of pricing a product category that didn't previously exist in a market dominated by free alternatives. They developed a credit-based pricing model charging $0.009 per search with 200 free non-expiring credits and a $9/month subscription for 1,000 credits. The solution balances individual developers making occasional queries against autonomous agents making thousands of searches, covers both variable search costs and fixed indexing infrastructure costs, and has achieved thousands of weekly users with hundreds of paying subscribers within three months of launch.
## Overview Ref is a premium documentation search service built as a paid Model Context Protocol (MCP) server, representing one of the first commercial standalone products in the emerging MCP ecosystem. Featured in this PulseMCP guest post by founder Matt Dailey, the case study provides valuable insights into the operational and business challenges of deploying LLM-powered products in production environments. The company serves AI coding agents and developers who need precise, up-to-date documentation search capabilities, addressing the core problem that LLMs hallucinate outdated information. What makes this case study particularly relevant to LLMOps is that it demonstrates the full production lifecycle considerations: infrastructure costs, API design, usage scaling, cost modeling, and sustainable business operations for LLM-powered services. ## Product Architecture and Technical Infrastructure Ref operates as an MCP server with two primary API tools: `ref_search_documentation` and `ref_read_url`, each consuming one credit per call. The technical architecture involves three main infrastructure layers that drive the operational costs and inform the LLMOps considerations. The search infrastructure layer uses Turbopuffer to host search indices, incurring charges for both data storage and query volume. This represents the foundational layer for the RAG (Retrieval-Augmented Generation) pipeline that powers Ref's core functionality. The choice of Turbopuffer as the vector database/search infrastructure indicates a managed service approach rather than self-hosting, which has implications for both cost predictability and operational overhead. The AI processing layer includes LLM costs within the retrieval pipeline, operating on top of the base Turbopuffer query costs. While the case study doesn't specify which LLM providers or models are used, this layer handles the intelligent aspects of search: understanding queries, ranking results, and likely performing semantic matching through embeddings. This represents variable costs that scale with query volume and introduces the typical LLMOps challenges of latency management, model selection, and token usage optimization. The web crawling layer represents the highest expense area, consuming compute resources to maintain current documentation. This continuous indexing process creates fixed costs that must be covered regardless of search volume. The crawler must handle various documentation sources including public repositories and PDFs, keep indices up-to-date, and manage the complexity of different documentation formats and structures. This ongoing maintenance requirement creates a fundamental tension in the pricing model: the most expensive operation (crawling and indexing) provides no immediate billable value to users, while users only perceive value when they execute searches. ## Core Differentiation and Market Positioning Ref positions itself in a competitive landscape that includes both free and paid alternatives. The paid search API competitors Tavily and Exa both charge approximately $0.01 per search with volume discounts and offer their own MCP servers, providing clear pricing benchmarks. On the free end, Context7 provides a free MCP server with basic RAG capabilities for documentation, primarily from public GitHub repositories. Ref's differentiation centers on two key capabilities. First, search precision specifically optimized for coding agents represents a technical investment in retrieval quality that goes beyond basic RAG implementations. Second, the ability to build private indexes alongside curated public documentation addresses enterprise and team use cases where proprietary internal documentation needs to be searchable. This hybrid public-private indexing capability suggests sophisticated multi-tenancy handling and isolation in the underlying infrastructure, important considerations for production LLM systems serving multiple customers. ## User Segmentation and Usage Patterns The alpha testing phase revealed three distinct user segments with dramatically different usage profiles, each presenting unique LLMOps challenges. Individual engineers using AI coding agents for documentation search represent the most common use case, typically generating tens of queries per day. Small teams sharing a single service account through API key distribution create medium-volume usage patterns. Agent developers integrating Ref into tools used by many users represent the highest-volume scenario, potentially generating thousands of requests from a single account. This wide variance in usage patterns—from tens to thousands of daily queries—creates significant challenges for infrastructure planning, cost prediction, and pricing fairness. In production LLM systems, such variance requires careful consideration of rate limiting, resource allocation, quota management, and infrastructure scaling. The fact that all access occurs through API keys rather than traditional user accounts simplifies authentication but requires robust monitoring and attribution systems to track usage accurately. ## Cost Structure and Economic Model Understanding the cost drivers was essential for building a sustainable pricing model. Ref faces both variable costs that scale with usage (Turbopuffer queries, LLM inference, spiky search volumes) and fixed costs that occur regardless of usage (ongoing index maintenance, web crawling infrastructure). This dual cost structure is common in production LLM applications but creates specific challenges for pricing. The decision to charge only for searches rather than indexing represents a value-based pricing philosophy. While it would be simpler from a cost-recovery perspective to charge for each repository or PDF indexed, the company recognized that users only perceive value when they receive search results. The indexing operation, while expensive and essential, is invisible infrastructure that enables the value-delivering search operation. This creates a key operational challenge: the pricing model must somehow cover substantial fixed indexing costs through variable search revenue. From an LLMOps perspective, this tension between fixed and variable costs is instructive. Many production LLM systems face similar dynamics where model training, fine-tuning, or data preparation represents substantial upfront or ongoing investment that must be amortized across inference requests. The solution—a minimum monthly commitment combined with usage-based charges—provides a template for addressing this common pattern. ## Pricing Strategy and Free Tier Design The credit-based pricing system emerged from several strategic constraints: supporting wide usage ranges, charging only for searches, covering indexing costs, offering full-feature trials, and limiting trials by usage rather than time. New users receive 200 free credits that never expire, estimated to represent approximately 10 weeks of casual usage for a typical developer. This generous free tier serves as a true evaluation period where users can test all features including private indexing before committing to payment. The paid tier requires a $9 monthly minimum for 1,000 credits, with additional credit blocks available at the same $9 per 1,000 rate. This works out to $0.009 per search, deliberately aligned with competitors Tavily and Exa who charge approximately $0.01 per search. The monthly minimum serves the critical function of covering fixed indexing costs even for low-volume users, while the ability to purchase additional credit blocks allows the pricing to scale for high-volume agent deployments. The decision to use usage-based limits rather than time-based trials reflects deep understanding of the product's usage patterns. A traditional two-week trial would poorly serve both individual developers (who might not search frequently enough to evaluate the product) and high-volume agent developers (who could generate thousands of expensive queries). This usage-based approach is particularly appropriate for LLM-powered APIs where value delivery is directly tied to API calls rather than time elapsed. ## Production Deployment Results and Market Validation Since launching with this pricing model three months prior to the case study publication, Ref has achieved thousands of weekly users and hundreds of paying subscribers with continued daily growth. While these absolute numbers are modest compared to major SaaS products, they represent meaningful validation for a first-mover in a new product category. The conversion from free to paid users indicates that the pricing structure successfully balances accessibility with sustainability. From an LLMOps perspective, the successful scaling to thousands of weekly users with hundreds of concurrent subscribers demonstrates that the underlying infrastructure can handle production load. The web crawling, indexing, vector search, and LLM inference pipeline must all operate reliably at scale while maintaining the cost structure that the pricing model depends on. The fact that the system supports both individual developers making occasional queries and autonomous agents making thousands of calls indicates flexible infrastructure that can handle variable load patterns. ## Challenges and Tradeoffs in the MCP Ecosystem The case study acknowledges several unique challenges in building paid products within the MCP ecosystem. The free ecosystem challenge arises because MCP is new and nearly everything else is free, requiring clear value communication to justify paid tiers. The dual identity problem emerges from serving both individual humans making occasional requests and autonomous agents making thousands of calls, essentially requiring simultaneous operation as both an API business and traditional SaaS product. The ecosystem timing advantage positions Ref as an early pioneer in MCP monetization, potentially influencing market norms for paid services in the protocol. However, this first-mover position also carries risks—the pricing decisions made now could be difficult to adjust later if they become established expectations, and the market might evolve in unexpected directions as the protocol matures. ## Critical Assessment and LLMOps Insights While this case study provides valuable insights into pricing and business model considerations for production LLM products, it should be noted that it represents a company's self-reported narrative about their own product. The claimed results (thousands of weekly users, hundreds of subscribers) are not independently verified, and the article's purpose is promotional. The technical details about the infrastructure, cost structure, and retrieval quality are limited, making it difficult to assess the true sophistication of the underlying LLM system. From an LLMOps perspective, several important production considerations receive limited attention in this narrative. There's no discussion of latency requirements, quality metrics for search results, monitoring and observability systems, error handling, or incident response procedures. The case study doesn't address how search quality is evaluated, whether there are A/B testing systems for improving retrieval, or how the LLM components are versioned and updated in production. Cost optimization strategies beyond the high-level infrastructure choices aren't detailed—for example, caching strategies, query optimization, or techniques for reducing LLM inference costs. The privacy and security implications of handling both public and private documentation indices deserve more attention than provided. How is customer data isolated? What security measures protect private indexed content? How are access controls enforced? These are critical questions for any production LLM system handling proprietary information, but they're not addressed in this business-focused narrative. Despite these limitations, the case study offers valuable lessons for LLMOps practitioners. The explicit acknowledgment of the tension between fixed and variable costs is particularly instructive, as many production LLM systems face similar economics. The decision framework that led to value-based rather than cost-based pricing demonstrates strategic thinking about how users perceive and receive value from LLM products. The usage-based free tier design provides a template for balancing user acquisition with cost control in production systems. The broader lesson for LLMOps is that sustainable production deployment requires thinking beyond technical implementation to encompass business model, pricing, user segmentation, and cost structure. The success of an LLM product in production depends not just on retrieval quality or inference latency, but on whether the economic model can sustain the ongoing operational costs while remaining attractive to users. Ref's approach—aligning pricing with competitive benchmarks, covering fixed costs through minimum commitments, and scaling with usage—represents one viable pattern for addressing these challenges, though it remains to be seen whether this model will prove sustainable as the market matures and competition intensifies.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.