An international infrastructure company partnered with NTT Data to evaluate whether GenAI could improve their work order management system that handles 500,000+ annual maintenance requests. The POC focused on automating classification, urgency assessment, and special handling requirements identification. Using a privately hosted LLM with company-specific knowledge base, the solution demonstrated improved accuracy and consistency in work order processing compared to the manual approach, while providing transparent reasoning for classifications.
This case study describes a GenAI proof of concept (POC) conducted by NTT Data for a leading international infrastructure company that manages housing complexes across the United States. The company handles over 500,000 maintenance requests annually, with approximately 70 employees manually processing around 1,500 work orders per day. The core challenge was that the manual approach to categorizing and managing work orders created significant opportunities for error and inconsistency in how each request was handled.
The POC was developed in under two weeks, demonstrating a rapid prototyping approach to GenAI adoption. It’s worth noting that this case study comes from NTT Data’s own blog, so the perspective is naturally favorable to their services and approach. However, the technical details and lessons learned provide valuable insights into real-world LLMOps considerations.
The infrastructure company wanted to understand if GenAI could improve both accuracy and efficiency in their work order management system. The specific tasks involved in work order processing include scheduling, dispatching, updating, and solving service-related problems. The POC was scoped to address three main business questions:
This scoping exercise is a critical LLMOps consideration—the team focused on specific, measurable outcomes rather than attempting to automate the entire work order process at once. This incremental approach is a best practice for GenAI adoption, as it allows for faster iteration and clearer success metrics.
A notable aspect of this implementation was the client’s requirement that the POC be built and hosted in a secure, third-party environment with a privately hosted large language model. NTT Data used their own infrastructure to host the LLM, which mitigated data privacy and security risks. This approach demonstrated that a fully functional AI tool could be built and eventually deployed within the client’s own firewall.
This security-first approach is increasingly important in enterprise LLMOps, particularly for companies handling sensitive resident information and maintenance data. The decision to use a privately hosted LLM rather than a public API service reflects growing enterprise concerns about data sovereignty, compliance, and the risks associated with sending proprietary information to third-party services.
The solution leveraged the client’s existing documentation, including well-documented policies, procedures, requirements, and a comprehensive list of over 160 work order categories. This information was used to prompt the LLM on the intricacies of accurately classifying incoming requests.
This represents a form of retrieval-augmented generation (RAG) or at minimum, extensive prompt engineering using domain-specific knowledge. By grounding the LLM’s responses in the company’s actual policies and categorization schema, the team was able to achieve reasonably accurate output even for ambiguous cases that human operators might struggle with—for example, determining whether a reported leak is a plumbing or HVAC issue.
The case study mentions that long-term, the client will use a custom application to adjust policies and add clarifications for the LLM, suggesting a continuous improvement model where the knowledge base and prompts can be refined over time. This iterative refinement is a key aspect of production LLM systems—the initial deployment is rarely perfect, and organizations need mechanisms to update and improve the system based on real-world feedback.
One of the distinguishing features of this GenAI solution was its ability to provide reasoning behind each classification decision. The LLM not only classifies each work order but explains why it was categorized in a particular way. This transparency provides several benefits:
Explainability is increasingly recognized as essential for enterprise AI adoption. Many organizations are reluctant to deploy “black box” systems, particularly for customer-facing or operationally critical processes. By having the LLM articulate its reasoning, the solution becomes more trustworthy and easier to debug when errors occur.
According to NTT Data, the GenAI solution demonstrated the capability to more quickly, accurately, and consistently classify work orders than the current manual approach. The case study suggests that over time, the LLM will continue to improve as policies are tweaked, potentially providing superior consistency compared to human operators with little to no additional training.
However, it’s important to note that this was a proof of concept, and specific quantitative metrics (accuracy rates, time savings, error reduction percentages) are not provided. The claim that the solution outperforms human operators should be viewed with appropriate skepticism until validated in a production environment with rigorous evaluation.
The client’s near-term plan is not to immediately replace human operators but rather to use the solution to understand how best to train and equip the current work-order processing team. This human-in-the-loop approach is a prudent strategy for initial GenAI deployments, as it allows the organization to:
Only once the LLM consistently demonstrates its usefulness will the company plan a full deployment. This staged rollout approach reduces risk and allows for course correction if issues arise.
The case study offers several lessons that are relevant to LLMOps practitioners:
Business Value First: The recommendation to start with business value before examining technology enablement is sound. Identifying and prioritizing the right use cases is critical for GenAI success, and this particular POC had clear potential to improve efficiency and resident satisfaction.
Speed with Patience: The advice to move fast but be patient acknowledges the experimental nature of early GenAI initiatives. POCs may not prove their value immediately, and organizations should be prepared to iterate or pivot based on learnings. The two-week development timeline demonstrates that rapid prototyping is possible with GenAI.
Partner Selection: The case study emphasizes the value of working with external partners who can bring both technical skills and strategic perspective. This is particularly relevant for organizations new to GenAI who may not have the internal expertise to evaluate the technology landscape or identify optimal use cases.
While the case study presents a positive narrative, several aspects warrant careful consideration:
That said, the approach taken—scoped POC, security-first architecture, explainable outputs, staged deployment—reflects sensible LLMOps practices that other organizations can learn from. The emphasis on using GenAI to augment rather than immediately replace human workers is also a pragmatic strategy for building organizational buy-in and managing change.
This case study illustrates how a traditional, process-heavy organization can begin exploring GenAI capabilities through carefully scoped proofs of concept. The infrastructure company’s work order classification challenge represents a common enterprise use case: high-volume, semi-structured text classification with the need for accuracy, consistency, and auditability. The solution’s emphasis on explainability, private hosting, and human-in-the-loop deployment reflects mature thinking about enterprise AI adoption, even at the POC stage.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
DoorDash implemented two major LLM-powered features during their 2025 summer intern program: a voice AI assistant for verifying restaurant hours and personalized alcohol recommendations with carousel generation. The voice assistant replaced rigid touch-tone phone systems with natural language conversations, allowing merchants to specify detailed hours information in advance while maintaining backward compatibility with legacy infrastructure through factory patterns and feature flags. The alcohol recommendation system leveraged LLMs to generate personalized product suggestions and engaging carousel titles using chain-of-thought prompting and a two-stage generation pipeline. Both systems were integrated into production using DoorDash's existing frameworks, with the voice assistant achieving structured data extraction through prompt engineering and webhook processing, while the recommendations carousel utilized the company's Carousel Serving Framework and Discovery SDK for rapid deployment.
This comprehensive case study examines how multiple enterprises (Autodesk, KPMG, Canva, and Lightspeed) are deploying AI agents in production to transform their go-to-market operations. The companies faced challenges around scaling AI from proof-of-concept to production, managing agent quality and accuracy, and driving adoption across diverse teams. Using the Relevance AI platform, these organizations built multi-agent systems for use cases including personalized marketing automation, customer outreach, account research, data enrichment, and sales enablement. Results include significant time savings (tasks taking hours reduced to minutes), improved pipeline generation, increased engagement rates, faster customer onboarding, and the successful scaling of AI agents across multiple departments while maintaining data security and compliance standards.