Cato Networks implemented a natural language search interface for their SASE management console's events page using Amazon Bedrock's foundation models. They transformed free-text queries into structured GraphQL queries by employing prompt engineering and JSON schema validation, reducing query time from minutes to near-instant while making the system more accessible to new users and non-English speakers. The solution achieved high accuracy with an error rate below 0.05 while maintaining reasonable costs and latency.
Cato Networks is a leading provider of secure access service edge (SASE) solutions, offering enterprise networking and security as a unified cloud-centered service. Their platform includes SD-WAN, cloud networking, and various security service edge (SSE) functions such as firewall as a service, secure web gateway, and zero trust network access. The case study describes how they enhanced their management console’s Events page by implementing a natural language search capability powered by large language models.
The Events page in Cato’s SASE management console serves as a central hub for viewing security, connectivity, system, and management events occurring on customer accounts. With potentially millions of events over a selected time range, users needed to manually configure filters to narrow down results. This process required deep familiarity with the product glossary and was time-consuming, creating a barrier for new users who lacked product-specific knowledge.
The existing filter system used a conjunction of statements consisting of a key (field name), operator (such as “is”, “in”, “includes”, “greater than”), and value(s). Users also had to specify time ranges following the ISO 8601 standard. This structured query format, while powerful, was difficult for users unfamiliar with the specific terminology and syntax required by the product.
The goal was to enable users to perform free text searches that would be automatically converted into structured queries compatible with the product’s GraphQL API. This is a classic natural language processing (NLP) task that benefits significantly from modern foundation models.
Cato Networks built their solution using Amazon Bedrock, which provides serverless access to foundation models from various providers. The architecture consists of four main components:
The decision to use Amazon Bedrock was driven by several factors. The service simplified access to multiple state-of-the-art foundation models through a single serverless API, making it straightforward to benchmark and switch between different models without infrastructure management overhead. Additionally, some models available through Bedrock, particularly from Anthropic (Claude) and Cohere, offer native understanding of JSON schemas and structured data, which was particularly valuable for this use case.
The team evaluated three approaches to customize foundation models for their specific task: prompt engineering, Retrieval Augmented Generation (RAG), and fine-tuning. They found that prompt engineering alone was sufficient to achieve their required results.
The prompt design follows a sophisticated structure centered on JSON schema validation. Rather than asking the model to generate GraphQL API requests directly, they implemented a two-step approach: first, instruct the model to return a response following a well-defined JSON schema, then validate the output against that schema before translating it to GraphQL.
The system prompt includes several critical components:
Each user query is structured to include the free text query itself plus contextual information such as the current datetime and day of week (necessary for handling relative time references like “last week” or “yesterday”).
A key aspect of their LLMOps approach is the recognition that model behavior is inherently non-deterministic. Rather than trusting model outputs blindly, they implemented a robust validation pipeline using the same JSON schema included in the prompt.
The JSON schema serves a dual purpose: it instructs the model on the expected output format and provides a mechanism for validating responses. When validation fails, the schema can identify the exact violation, enabling nuanced error handling policies:
This approach acknowledges the reality of working with LLMs in production—outputs may not always be perfect, but with proper validation, the system can degrade gracefully rather than failing completely.
The team created a comprehensive benchmark consisting of hundreds of text queries paired with their expected JSON outputs. This allowed systematic evaluation of model performance across three outcome categories:
The release criterion was an error rate below 0.05, and the selected model was the one with the highest success rate meeting this criterion. After evaluating several foundation models on Amazon Bedrock for accuracy, latency, and cost, they selected anthropic.claude-3-5-sonnet-20241022-v2:0 as their production model.
This benchmarking approach represents a mature LLMOps practice—establishing clear, measurable criteria for model selection and creating reproducible evaluation pipelines that can be rerun as new models become available.
The NLS service is deployed on Amazon EKS, providing container orchestration for the translation service. This architecture separates concerns effectively: the translation service handles prompt construction and response validation, while Amazon Bedrock handles model inference as a fully managed service.
The serverless nature of Amazon Bedrock was particularly valuable for this use case, as it eliminated the need to manage GPU infrastructure or handle scaling concerns. The team could focus on the application logic rather than infrastructure management.
The case study reports several positive outcomes, though these should be evaluated with the understanding that this is a vendor-published blog post:
The team acknowledges that while prompt engineering met their current needs, there are scenarios where alternative approaches might be warranted. Specifically, they note that including the entire JSON schema in the prompt can lead to high token counts per query. For users handling very complex schemas, fine-tuning a model to embed product knowledge into the weights (rather than including it in every prompt) could reduce costs.
This represents an important consideration for LLMOps practitioners: the trade-off between prompt complexity (and associated token costs) versus the investment required for fine-tuning. The Cato Networks team made a pragmatic choice to start with prompt engineering, which allowed them to iterate quickly and get to production faster, while leaving the door open for optimization as usage patterns become clearer.
This case study illustrates several LLMOps best practices:
Toyota Motor North America (TMNA) and Toyota Connected built a generative AI platform to help dealership sales staff and customers access accurate vehicle information in real-time. The problem was that customers often arrived at dealerships highly informed from internet research, while sales staff lacked quick access to detailed vehicle specifications, trim options, and pricing. The solution evolved from a custom RAG-based system (v1) using Amazon Bedrock, SageMaker, and OpenSearch to retrieve information from official Toyota data sources, to a planned agentic platform (v2) using Amazon Bedrock AgentCore with Strands agents and MCP servers. The v1 system achieved over 7,000 interactions per month across Toyota's dealer network, with citation-backed responses and legal compliance built in, while v2 aims to enable more dynamic actions like checking local vehicle availability.
42Q, a cloud-based Manufacturing Execution System (MES) provider, implemented an intelligent chatbot named Arthur to address the complexity of their system and improve user experience. The solution uses RAG and AWS Bedrock to combine documentation, training videos, and live production data, enabling users to query system functionality and real-time manufacturing data in natural language. The implementation showed significant improvements in user response times and system understanding, while maintaining data security within AWS infrastructure.
This panel discussion features three AI-native companies—Delphi (personal AI profiles), Seam AI (sales/marketing automation agents), and APIsec (API security testing)—discussing their journeys building production LLM systems over three years. The companies address infrastructure evolution from single-shot prompting to fully agentic systems, the shift toward serverless and scalable architectures, managing costs at scale (including burning through a trillion OpenAI tokens), balancing deterministic workflows with model autonomy, and measuring ROI through outcome-based metrics rather than traditional productivity gains. Key technical themes include moving away from opinionated architectures to let models reason autonomously, implementing state machines for high-confidence decisions, using tools like Pydantic AI and Logfire for instrumentation, and leveraging Pinecone for vector search at scale.