Amazon Health Services: Healthcare Search Discovery Using ML and Generative AI on E-commerce Platform - ZenML LLMOps Database

LLMOps Database

Amazon Health Services

Company

Amazon Health Services

Title

Healthcare Search Discovery Using ML and Generative AI on E-commerce Platform

Industry

Healthcare

Link

https://aws.amazon.com/blogs/machine-learning/learn-how-amazon-health-services-improved-discovery-in-amazon-search-using-aws-ml-and-gen-ai?tag=soumet-20

Year

2025

Summary (short)

Amazon Health Services faced the challenge of integrating healthcare services into Amazon's e-commerce search experience, where traditional product search algorithms weren't designed to handle complex relationships between symptoms, conditions, treatments, and healthcare services. They developed a comprehensive solution combining machine learning for query understanding, vector search for product matching, and large language models for relevance optimization. The solution uses AWS services including Amazon SageMaker for ML models, Amazon Bedrock for LLM capabilities, and Amazon EMR for data processing, implementing a three-component architecture: query understanding pipeline to classify health searches, LLM-enhanced product knowledge base for semantic search, and hybrid relevance optimization using both human labeling and LLM-based classification. This system now serves daily health-related search queries, helping customers find everything from prescription medications to primary care services through improved discovery pathways.

## Case Study Overview Amazon Health Services (AHS) represents Amazon's expansion beyond traditional e-commerce into comprehensive healthcare services, including Amazon Pharmacy for prescription medications, One Medical for primary care, and Health Benefits Connector for specialized care partnerships. This case study demonstrates how AHS leveraged machine learning and generative AI to solve the unique challenge of healthcare discovery within Amazon's existing e-commerce search infrastructure. The fundamental problem was that traditional product search algorithms, optimized for physical goods like books and electronics, were inadequate for healthcare queries that involve complex relationships between symptoms, conditions, treatments, and services. Healthcare searches require sophisticated understanding of medical terminology and the ability to map between professional medical terms and layperson language that customers typically use. ## Technical Architecture and LLMOps Implementation The solution architecture represents a sophisticated LLMOps implementation built entirely on AWS services, demonstrating production-scale deployment of large language models for healthcare search optimization. The system consists of three main components that work together to deliver improved healthcare discovery. ### Query Understanding Pipeline The query understanding component addresses the spectrum of healthcare search queries, from specific "spearfishing queries" where customers search for exact medications like "atorvastatin 40 mg" to broad upper-funnel searches like "back pain relief" or "acne." This dual approach required developing specialized ML models for different query types. For specific medication searches, AHS analyzed anonymized customer search engagement data to train a classification model that identifies queries exclusively leading to engagement with Amazon Pharmacy products. This process utilized PySpark on Amazon EMR and Amazon Athena for large-scale data processing, demonstrating the importance of robust data infrastructure in LLMOps deployments. For broad health search intent identification, the team implemented a named entity recognition (NER) model trained on health ontology data sources to identify medical concepts including conditions, diseases, treatments, injuries, and medications. The interesting LLMOps aspect here is how they augmented their knowledge base using LLMs when they lacked sufficient alternate terms for health concepts. For example, they used LLMs to generate alternate terms for "acid reflux" such as "heart burn," "GERD," and "indigestion," effectively using generative AI to bootstrap their training data. ### Product Knowledge Base Enhancement The product knowledge base component represents a sophisticated application of LLMs for content augmentation in production. Starting with existing product metadata and catalog information, AHS used large language models through Amazon Bedrock's batch inference capability to layer additional relevant health conditions, symptoms, and treatment-related keywords for each product or service. This approach demonstrates a key LLMOps pattern where LLMs are used not for direct customer interaction but for data enrichment at scale. The batch inference capability allowed them to process their entire product catalog efficiently, significantly expanding their product knowledge with medically relevant information. The enhanced knowledge base was then converted into embeddings using Facebook AI Similarity Search (FAISS) and indexed for efficient similarity searches. The technical implementation utilized Amazon S3 for storage of both the knowledge base and embedding files, with large-scale embedding jobs executed through scheduled SageMaker Notebook Jobs. This shows how modern LLMOps implementations leverage cloud-native services for scalable ML operations while maintaining careful mappings from embeddings back to original knowledge base items for accurate reverse lookups. ### Retrieval Augmented Generation (RAG) Implementation The solution implements a comprehensive RAG pattern that represents one of the more sophisticated production applications of this approach. When customer search queries arrive, they're converted to embeddings and used as search keys for matching against the product knowledge index using FAISS with similarity score thresholds. The relevance optimization component demonstrates advanced LLMOps practices through its hybrid approach to quality assurance. AHS implemented a two-pronged relevance labeling system using the established Exact, Substitute, Complement, Irrelevant (ESCI) framework from academic research. First, they worked with human labeling teams to establish ground truth on substantial sample sizes, creating reliable benchmarks for system performance. Second, and more relevant to LLMOps, they implemented LLM-based labeling using Amazon Bedrock batch jobs. After similarity matches were identified, the system retrieved top products and used them as prompt context for generative models, including few-shot examples of ESCI guidance as part of the prompt. This allowed them to conduct large-scale inference across top health searches, connecting them to relevant offerings through semantic understanding. ## Production Deployment and Scalability Considerations The production deployment aspects of this case study reveal important LLMOps lessons about scaling generative AI applications beyond simple chatbots. The system handles daily health-related search queries for Amazon's massive customer base, requiring robust infrastructure and careful optimization strategies. Cost optimization strategies include using Amazon EMR on EC2 Spot Instances for batch processing jobs, implementing caching for frequently searched queries, and choosing appropriate instance types for different workload requirements. The scalability approach involves designing vector search infrastructure to handle peak traffic, implementing auto-scaling for inference endpoints, and establishing comprehensive monitoring and alerting systems. The maintenance requirements highlight ongoing LLMOps considerations including regular updates to health ontology datasets, continuous monitoring of model performance with retraining as needed, and keeping the product knowledge base current with new healthcare offerings and medical information. ## LLMOps Innovation and Best Practices This case study demonstrates several innovative LLMOps practices that extend beyond traditional conversational AI applications. The use of LLMs for knowledge base augmentation shows how generative AI can be applied to content creation and data enrichment at scale. The batch inference approach through Amazon Bedrock represents efficient resource utilization for large-scale processing tasks that don't require real-time interaction. The hybrid human-LLM labeling approach for relevance assessment demonstrates sophisticated quality assurance practices in production LLM systems. By combining human expertise for establishing ground truth with LLM capabilities for large-scale inference, AHS created a scalable approach to maintaining high-quality search results while managing the costs and complexity of purely human labeling. The domain-specific application of NER models enhanced with LLM-generated training data shows how traditional ML techniques can be augmented with generative AI capabilities. This hybrid approach leverages the strengths of both established ML methods and newer generative technologies. ## Security, Compliance, and Ethical Considerations The healthcare domain introduces additional complexity around privacy and compliance that this case study handles through careful architectural choices. The solution explicitly avoids using individual customer data, instead relying on aggregated search patterns and publicly available health ontology information. This approach demonstrates how LLMOps implementations can be designed to meet stringent privacy requirements while still delivering personalized experiences. The system's adherence to healthcare data privacy regulations like HIPAA shows the importance of compliance considerations in LLMOps deployments, particularly in regulated industries. The approach of using semantic understanding rather than individual customer profiles represents a privacy-preserving design pattern that other healthcare AI implementations could follow. ## Results and Business Impact The successful deployment of this system demonstrates the business value of sophisticated LLMOps implementations. Customers searching for healthcare solutions now see medically vetted and relevant offerings alongside other products, creating additional discovery pathways for relevant health services. The system enables customers to quickly find doctors, prescription medications, and healthcare services through familiar e-commerce interfaces. The semantic understanding capabilities create connections that help customers find appropriate healthcare solutions at the right time, whether they're dealing with acute conditions like strep throat and fever or chronic conditions such as arthritis and diabetes. This represents a significant improvement over traditional keyword-based search that often missed relevant healthcare services. ## Technical Lessons and Broader Implications This case study provides several important lessons for LLMOps practitioners. The emphasis on domain-specific ontology development shows the importance of leveraging established knowledge frameworks when applying LLMs to specialized domains. The combination of existing product knowledge with LLM-augmented real-world knowledge demonstrates how to build comprehensive data foundations without starting from scratch. The use of generative AI for tasks beyond chatbots illustrates the broader potential of LLMs in production systems. The batch inference capabilities, knowledge augmentation, and relevance labeling applications show how LLMs can be integrated into complex data processing pipelines to enhance traditional ML capabilities. The scalable architecture built on AWS services demonstrates how modern cloud platforms can support sophisticated LLMOps implementations, from data processing through model deployment and ongoing maintenance. The careful balance between automation and human oversight in the relevance labeling system provides a template for maintaining quality in large-scale LLM applications while managing operational costs.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.

Use Open Source