A procurement team developed an advanced LLM-powered system called "Smart Business Analyst" to automate competitor analysis in the medical device industry. The system addresses the challenge of gathering and analyzing competitor data across multiple dimensions, including features, pricing, and supplier relationships. Unlike general-purpose LLMs like ChatGPT, this solution provides precise numerical comparisons and leverages multiple data sources to deliver accurate, industry-specific insights, significantly reducing the time required for competitive analysis from hours to seconds.
This case study, presented at an NLP Summit by Devananda Kumar from Philips, describes a GenAI-powered application designed to automate competitor intelligence gathering and analysis within a procurement and supply chain context. The presentation focuses specifically on how LLMs can be deployed in production to support business analysts and procurement teams in the medical device industry, though the approach appears applicable across manufacturing sectors.
The core problem being addressed is the labor-intensive nature of competitive intelligence gathering in procurement. Traditional methods require business analysts to manually access multiple data sources including PDFs, Word documents, SharePoint repositories, internal databases, and external websites to compile competitive insights. For example, when comparing a medical device analyzer across 25 features against 8 competitors, analysts would need to manually fill in approximately 200 data fields—a time-consuming task where information gaps are common due to the difficulty of finding relevant data on the internet.
The presentation outlines three main categories of competitor landscape analysis where GenAI can provide value:
Market Analysis: Understanding market share, regional performance, and trend analysis. The system connects to internal sales databases and external sources to answer questions like “What is the market share of our analyzer device in North America in Q2 2024 compared to competitors?”
Reviews and Ratings: Analyzing end-user perspectives on product features and performance through sentiment analysis and feature extraction from customer feedback.
Price Analysis and Procurement: The primary focus of this case study—evaluating competitor procurement strategies, supplier networks, and cost structures to optimize procurement decisions.
The goal for procurement teams is to identify potential new suppliers, benchmark procurement costs, understand competitor supplier relationships, and find cost-saving opportunities without compromising quality.
The solution, coined the “Smart Business Analyst,” employs a multi-agent architecture to maximize information retrieval from limited and niche data sources. The system recognizes that competitive intelligence information is not abundantly available, so it must be thorough in its search approach.
The data flow consists of three primary agents:
Subquery Generation Agent: Since competitive intelligence queries often require information from diverse sources, the system splits a single business query into multiple sub-queries to maximize information retrieval from the internet. This approach acknowledges that niche competitive information requires casting a wide net across various data sources.
Search Agents: These agents scrape and store the top-ranked URLs from the web into a vector database. This represents a RAG (Retrieval-Augmented Generation) approach where external web content is indexed and made available for the LLM to reference when generating responses.
Report Generation Agent: This agent synthesizes the collected information and formats it into a display-ready state, providing prescriptive or forward-looking insights in a readable format.
The system is designed to integrate with multiple data source types:
The presentation includes a data source mapping matrix showing which sources are relevant for different analysis types (design benchmarking, cost benchmarking, etc.), allowing the system to prioritize sources based on the query type.
A key architectural decision highlighted is the integration of the Google Search API wrapper to address LLM knowledge cutoff limitations. Since LLM APIs are trained on historical data, the system uses the Google Search API to fetch the most current information from the internet, ensuring responses include recent developments, news, and market updates that occurred after the base model’s training cutoff.
The presenter explicitly addresses how this production system differs from using ChatGPT directly:
Precision in Numerical Data: When evaluating medical device features, generic ChatGPT tends to provide qualitative comparisons (e.g., “your analyzer device is better than your competitor”). The Smart Business Analyst is trained to provide exact numerical values (e.g., “rotation degrees: 35° for your company vs. 33° for competitor”).
Domain-Specific Training: The system appears to incorporate domain-specific knowledge about medical devices, procurement terminology, and competitive intelligence frameworks that generic models may lack.
Structured Output: The system can populate structured comparison matrices automatically, which is more suitable for business workflows than free-form text responses.
Several production-ready features are mentioned:
Visualization Charts: The system can generate visual representations of data, not just textual responses.
Conversational Memory: The application maintains context across a conversation session, allowing users to ask follow-up questions without restating the original context. This mirrors the ChatGPT experience but within a controlled, enterprise environment.
Multilingual Support: End users can pose questions in any language, expanding accessibility for global teams.
Pluggable Data Sources: The architecture allows additional data sources to be connected to the system as needed, providing flexibility for expanding coverage.
Source Attribution: The interface displays the multiple websites from which information was extracted, providing transparency and allowing users to verify sources.
The presentation distinguishes between two types of analysis the system supports:
Quantitative Analysis relies on procurement data aggregators and third-party data providers. Examples of KPIs include:
The presenter acknowledges data limitations in quantitative analysis—competitor component costs, special discounts, and supplier quality metrics may not be readily available. In such cases, industry standards are used to derive conclusions.
Qualitative Analysis is the primary focus of the GenAI application, gathering insights from secondary research sources like news, blogs, company annual reports, and specialized publications. This is where the LLM excels at synthesizing unstructured text into actionable intelligence.
A practical example is provided: A procurement manager queries “What is the cost reduction strategy that my competitor is obtaining?” The system:
While the presentation demonstrates a thoughtful application of LLMs to a real business problem, several aspects warrant consideration:
Accuracy Verification: The claim that the system provides “exact” numerical values for product features needs validation. Web-scraped data can be outdated or inaccurate, and the presentation doesn’t detail how accuracy is verified.
Hallucination Mitigation: No specific mention is made of hallucination detection or mitigation strategies, which is particularly important when providing numerical data that could influence procurement decisions.
Scalability and Maintenance: The multi-agent architecture with web scraping components may face challenges with website changes, rate limiting, and data quality over time.
Competitive Data Ethics: Gathering competitor intelligence through web scraping raises questions about data usage rights and competitive ethics that aren’t addressed.
Quantified Results: While the presentation mentions time savings (“filled up within a few seconds”), no specific metrics on accuracy improvements, user adoption, or business impact are provided.
The presentation appears to be more of a concept demonstration than a fully-deployed production case study, though the architecture described is sound and represents a practical approach to applying LLMs in an enterprise procurement context. The multi-agent design, source attribution, and real-time search integration represent solid LLMOps practices for production deployments.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
This case study explores how Airia developed an orchestration platform to help organizations deploy AI agents in production environments. The problem addressed is the significant complexity and security challenges that prevent businesses from moving beyond prototype AI agents to production-ready systems. The solution involves a comprehensive platform that provides agent building capabilities, security guardrails, evaluation frameworks, red teaming, and authentication controls. Results include successful deployments across multiple industries including hospitality (customer profiling across hotel chains), HR, legal (contract analysis), marketing (personalized content generation), and operations (real-time incident response through automated data aggregation), with customers reporting significant efficiency gains while maintaining enterprise security standards.
Shreyaa Shankar presents DocETL, an open-source system for semantic data processing that addresses the challenges of running LLM-powered operators at scale over unstructured data. The system tackles two major problems: how to make semantic operator pipelines scalable and cost-effective through novel query optimization techniques, and how to make them steerable through specialized user interfaces. DocETL introduces rewrite directives that decompose complex tasks and data to improve accuracy and reduce costs, achieving up to 86% cost reduction while maintaining target accuracy. The companion tool Doc Wrangler provides an interactive interface for iteratively authoring and debugging these pipelines. Real-world applications include public defenders analyzing court transcripts for racial bias and medical analysts extracting information from doctor-patient conversations, demonstrating significant accuracy improvements (2x in some cases) compared to baseline approaches.