Finance
London Stock Exchange Group
Company
London Stock Exchange Group
Title
AI-Powered Market Surveillance System for Financial Compliance
Industry
Finance
Year
2025
Summary (short)
London Stock Exchange Group (LSEG) developed an AI-powered Surveillance Guide using Amazon Bedrock and Anthropic's Claude Sonnet 3.5 to automate market abuse detection by analyzing news articles for price sensitivity. The system addresses the challenge of manual and time-consuming surveillance processes where analysts must review thousands of trading alerts and determine if suspicious activity correlates with price-sensitive news events. The solution achieved 100% precision in identifying non-sensitive news and 100% recall in detecting price-sensitive content, significantly reducing analyst workload while maintaining comprehensive market oversight and regulatory compliance.
## Overview London Stock Exchange Group (LSEG) is a global financial markets infrastructure provider that operates the London Stock Exchange and manages international equity, fixed income, and derivative markets. The company facilitates the trading and reporting of over £1 trillion of securities by 400 members annually, creating a massive surveillance challenge for detecting market abuse across all MiFID asset classes, markets, and jurisdictions. The case study describes LSEG's implementation of an AI-powered market surveillance system called "Surveillance Guide" that leverages Amazon Bedrock and Anthropic's Claude Sonnet 3.5 model to automate the analysis of news articles for price sensitivity. This represents a significant LLMOps deployment in the highly regulated financial services sector where accuracy, explainability, and compliance are paramount. ## Problem Context and Business Challenge LSEG's existing surveillance monitoring systems generated automated alerts to flag suspicious trading activity to their Market Supervision team. However, the subsequent analysis was largely manual and resource-intensive. Analysts had to conduct initial triage assessments to determine whether flagged activity warranted further investigation, which involved manual collation of evidence including regulation analysis, news sentiment evaluation, and trading activity correlation. A critical bottleneck occurred during insider dealing investigations where analysts needed to assess whether statistically significant price movements correlated with price-sensitive news during the observation period. This initial triaging step was time-consuming and still often necessitated full investigations even when the activity was ultimately deemed non-suspicious. The dynamic nature of financial markets and increasingly sophisticated bad actors further complicated the challenge, leading to high false positive rates that adversely impacted analyst efficiency and could result in operational delays. ## Technical Architecture and LLMOps Implementation The solution architecture demonstrates several key LLMOps principles and practices. The system consists of three main components that form a complete production pipeline: **Data Ingestion and Preprocessing Pipeline**: The system processes approximately 250,000 RNS (Regulatory News Service) articles spanning six consecutive months of trading activity. Raw HTML documents are ingested and preprocessed within the AWS environment, with extraneous HTML elements removed to extract clean textual content. This preprocessing stage represents a critical data engineering component of the LLMOps pipeline, ensuring consistent input quality for the downstream model. **Amazon Bedrock Integration**: The core of the system utilizes Amazon Bedrock's managed service architecture to access Anthropic's Claude Sonnet 3.5 model. This choice reflects practical LLMOps considerations around model management, scalability, and operational complexity. Amazon Bedrock provides serverless architecture enabling dynamic scaling of model inference capacity based on news volume while maintaining consistent performance during market-critical periods. The service also offers built-in monitoring and governance features that support audit trails required for regulatory compliance. **Inference Application**: A Streamlit-based visualization interface presents results and predictions to analysts. This component demonstrates the importance of user experience in production LLM systems, providing not just classifications but also detailed justifications that support analyst decision-making and regulatory requirements. ## Model Development and Evaluation Methodology The team employed rigorous methodology reflecting mature LLMOps practices. They conducted comprehensive exploratory data analysis across three dimensions: news categories, financial instruments referenced, and article length distribution. This analysis informed sampling strategy for creating a representative evaluation dataset. A critical aspect of the approach was the creation of a human-annotated ground truth dataset. 110 articles were selected to cover major news categories and presented to market surveillance analysts who, as domain experts, evaluated each article's price sensitivity on a nine-point scale. This scale was then consolidated into three categories: PRICE_NOT_SENSITIVE (1-3), HARD_TO_DETERMINE (4-6), and PRICE_SENSITIVE (7-9). The technical implementation utilized Amazon SageMaker with Jupyter Notebooks as the development environment, demonstrating best practices for ML experimentation infrastructure. The team used the Instructor library for integration with Claude Sonnet 3.5 through Amazon Bedrock, custom Python data processing pipelines, and systematic experimentation with various algorithmic approaches including traditional supervised learning methods and prompt engineering techniques. ## Two-Stage Classification Architecture The system employs an innovative two-step classification process that reflects sophisticated prompt engineering and task decomposition strategies. Step 1 classifies news articles as potentially price sensitive or other, while Step 2 classifies articles as potentially price not sensitive or other. This multi-stage architecture maximizes classification accuracy by allowing the model to focus on specific aspects of price sensitivity at each stage. The merging rules for consolidating results from both steps demonstrate careful consideration of edge cases and ambiguous classifications. When Step 1 identifies content as sensitive and Step 2 classifies it as non-sensitive, the system flags this as ambiguous requiring manual review, showing appropriate handling of model uncertainty. ## Prompt Engineering and Model Optimization The prompt engineering approach showcases advanced techniques for production LLM systems. The system prompt establishes the model as "an expert financial analyst with deep knowledge of market dynamics, investor psychology, and the intricate relationships between news events and asset prices." The prompt includes detailed expertise areas covering market dynamics, investor psychology, news analysis, pattern recognition, sector-specific knowledge, regulatory insight, macroeconomic perspective, and quantitative skills. The prompts are designed to elicit three key components: a concise summary of the news article, a price sensitivity classification, and a chain-of-thought explanation justifying the classification decision. This approach ensures explainability, which is crucial for regulatory compliance and analyst trust in the system. A notable aspect is the conservative approach built into the prompt: "If there's any reasonable doubt about whether news could be price-sensitive, you should classify it as 'OTHER' rather than 'NOT_PRICE_SENSITIVE'." This reflects mature understanding of the risks associated with false negatives in financial surveillance applications. ## Performance Metrics and Results The evaluation framework established specific technical success metrics including data pipeline implementation, metric definition (precision, recall, and F1), and workflow completion. The system was optimized to maximize precision for the NOT SENSITIVE class and recall for the PRICE SENSITIVE class, a deliberate strategy that enables high confidence in non-sensitive classifications to reduce unnecessary escalations to human analysts. The reported results are impressive: 100% precision in identifying non-sensitive news across 6 articles, and 100% recall in detecting price-sensitive content across 64 articles (36 hard to determine and 28 price sensitive). However, these metrics should be interpreted cautiously given the relatively small evaluation dataset size and the potential for overfitting to the specific test cases. ## Production Deployment and Operational Considerations The system demonstrates several mature LLMOps practices for production deployment. Amazon Bedrock's serverless architecture enables automatic scaling based on news volume, critical for handling market volatility periods when news flow increases dramatically. The built-in monitoring and governance features support audit trails required for regulatory compliance, addressing a key operational requirement in financial services. The solution provides automated analysis of complex financial news with detailed justifications for classification decisions, enabling effective triaging of results by sensitivity level. This approach transforms market surveillance operations by reducing manual review time, improving consistency in price-sensitivity assessment, and enabling faster response to potential market abuse cases while scaling surveillance capabilities without proportional resource increases. ## Regulatory and Compliance Considerations The case study highlights several important considerations for LLMOps in regulated industries. The system maintains comprehensive audit trails through automated justifications, supports regulatory compliance through detailed documentation of decision-making processes, and provides explainable AI outputs that can be reviewed by regulators and internal compliance teams. The conservative classification approach, where uncertain cases are flagged for manual review rather than automatically classified, demonstrates appropriate risk management for regulatory applications. The system's ability to provide detailed reasoning for classifications supports the regulatory requirement for explainable decision-making in market surveillance. ## Technical Limitations and Future Enhancements While the results appear promising, several limitations should be noted. The evaluation dataset of 110 articles, while carefully curated, is relatively small for robust generalization claims. The perfect precision and recall scores, while impressive, may reflect overfitting to the specific evaluation set rather than true generalization capability. LSEG plans several enhancements that reflect mature LLMOps thinking: integrating additional data sources including company financials and market data, implementing few-shot prompting and fine-tuning capabilities, expanding the evaluation dataset for continued accuracy improvements, deploying in live environments alongside manual processes for validation, and adapting to additional market abuse typologies. ## Broader LLMOps Implications This case study demonstrates several important principles for successful LLMOps implementations in regulated industries. The careful balance between automation and human oversight, the emphasis on explainability and audit trails, the conservative approach to uncertain classifications, and the integration with existing workflows all represent best practices for production LLM systems in high-stakes environments. The choice of Amazon Bedrock as the foundational service reflects practical considerations around managed services versus self-hosted solutions, highlighting how cloud-based ML platforms can accelerate LLMOps adoption while maintaining enterprise-grade security and compliance requirements. The system's architecture demonstrates how to effectively combine data preprocessing, model inference, and user interface components into a cohesive production system that serves real business needs while maintaining regulatory compliance.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.