Bloomberg: AI-Powered Developer Productivity Platform with MCP Servers and Agent-Based Automation

Company

Bloomberg

Title

AI-Powered Developer Productivity Platform with MCP Servers and Agent-Based Automation

Industry

Finance

Link

https://www.youtube.com/watch?v=Q81AzlA-VE8

Year

2025

Summary (short)

Bloomberg's Technology Infrastructure team, led by Lei, implemented an enterprise-wide AI coding platform to enhance developer productivity across 9,000+ engineers working with one of the world's largest JavaScript codebases. Starting approximately two years before this presentation, the team moved beyond initial experimentation with various AI coding tools to focus on strategic use cases: automated code uplift agents for patching and refactoring, and incident response agents for troubleshooting. To avoid organizational chaos, they built a platform-as-a-service (PaaS) approach featuring a unified AI gateway for model selection, an MCP (Model Context Protocol) directory/hub for tool discovery, and standardized tool creation/deployment infrastructure. The solution was supported by integration into onboarding training programs and cross-organizational communities. Results included improved adoption, reduced duplication of efforts, faster proof-of-concepts, and notably, a fundamental shift in the cost function of software engineering that enabled teams to reconsider trade-offs in their development practices.

## Company and Use Case Overview Bloomberg, a financial services and data company, operates at massive scale with over 9,000 engineers (predominantly software engineers), handles billions of market data ticks (600+ billion), and employs 500+ people focused on AI research and engineering. Their flagship product, the Bloomberg Terminal, supports thousands of different applications ("functions" in their terminology) ranging from email and news to fixed income calculations and trading workflows. The company maintains one of the largest private networks globally and one of the world's largest JavaScript codebases. Lei, who leads the Technology Infrastructure department, presented this case study on their journey to operationalize AI for coding at enterprise scale. The technical infrastructure team focuses on global infrastructure including data centers, connectivity, developer productivity, SRE tooling, and reliability solutions encompassing telemetry and incident response. Notably, Bloomberg actively contributes to open source, having helped create the CNCF Envoy AI Gateway among other projects. This context is important because it demonstrates that this is an organization managing enormous complexity across diverse technology stacks and domains, making their LLMOps challenges particularly instructive for other large-scale deployments. ## Initial Exploration and Reality Check Bloomberg began their AI coding journey approximately two years prior to this presentation (around 2023). Like many organizations, they initially surveyed the landscape of available AI coding tools and found the options overwhelming. The speaker acknowledged that they didn't initially know which AI solutions would best boost productivity and stability, but recognized that deployment and experimentation were necessary prerequisites to discovering value. The team quickly formed, released initial capabilities, and began measuring impact. As a data-driven company, they focused heavily on metrics. Initial measurements looked at typical developer productivity indicators and ran surveys. Early results were encouraging for certain use cases: developers reported much quicker proof-of-concept development, faster test creation, and efficient generation of one-time scripts. However, the speaker noted with important candor that "measurements dropped actually pretty quickly when you go beyond all the green field type of thing." This honest assessment reflects a common pattern in LLMOps adoption where initial excitement around greenfield code generation doesn't translate cleanly to more complex, legacy-heavy environments. This realization prompted strategic thinking about where AI coding tools could genuinely "make a dent" rather than just provide surface-level productivity gains. The team also became thoughtful about the implications of unleashing powerful tools on a codebase containing hundreds of millions of lines of code, where system complexity grows at least polynomially (if not exponentially) with live code. This awareness of risk is a critical LLMOps consideration—the same speed that makes AI tools valuable also makes them potentially dangerous at scale. ## Strategic Focus: Beyond Simple Code Generation Rather than viewing AI coding narrowly, Bloomberg adopted a broader perspective encompassing software engineering workflows. They specifically targeted work that developers don't prefer—maintenance work, migration work, and what the speaker termed "mundane work." This strategic focus on automation of toil is a mature approach to LLMOps that recognizes AI's comparative advantage in repetitive, well-defined tasks. ### Use Case 1: Uplift Agents for Code Evolution The first major implementation was "uplift agents" designed to automatically handle code patching and refactoring. The vision was compelling: receive a ticket indicating that software needs patching, and simultaneously receive a pull request with the fix, the patch applied, and documentation explaining the rationale. These agents broadly scan the codebase to identify where patches are applicable and automatically apply them. Bloomberg had previously used regex-based refactoring tools that worked "to some extent but with limitations." The LLM-powered approach demonstrated "very much better results." This is an important technical point—while traditional static analysis and pattern matching have their place, LLMs provide semantic understanding that enables more sophisticated code transformations. However, the speaker openly discussed challenges encountered: - **Verification remains difficult**: Like any AI/ML system, having deterministic verification capabilities would be ideal, but this isn't always easy to achieve. The quality of the uplift depends heavily on test coverage, linter quality, and other verification mechanisms. Without these, patches can be difficult to apply confidently. - **Pull request management overhead**: A significant operational challenge emerged—the average number of open pull requests increased, and time-to-merge also increased. This is a crucial LLMOps insight: AI tools can generate code faster than humans can review it, creating a bottleneck shift rather than pure productivity gain. This speaks to the need for complementary automation in code review and merge processes. - **Conceptual shift**: The nature of work shifts from "how we want to achieve" to "what we want to achieve." This philosophical point reflects how LLMs change the engineering workflow—specification and validation become more important than implementation mechanics. ### Use Case 2: Incident Response Agents The second major use case involved deploying incident response agents. The speaker highlighted several key advantages of AI in this domain: - **Speed and lack of bias**: Gen AI tools can rapidly traverse codebases, telemetry systems, feature flags, call traces, and other diagnostic information without the cognitive biases that humans bring to troubleshooting. Engineers often have preconceived notions ("it must be this") that prove incorrect, whereas AI agents can systematically evaluate evidence. - **Comprehensive context gathering**: The agents can quickly correlate information across multiple systems—metrics dashboards, logs, network and service dependency topology, alarms, triggers, and SLOs. The technical architecture for incident response agents relies heavily on MCP (Model Context Protocol) servers that connect to various observability and operational systems. This represents a sophisticated LLMOps pattern where the LLM serves as an orchestration and reasoning layer over specialized data access tools. ## Platform Engineering Approach: The "Paved Path" A critical inflection point in Bloomberg's journey was recognizing that uncoordinated adoption would create chaos. With 9,000 engineers, the speaker noted they had "10 teams who wants to build a pull request review bots" and "too many teams who wants to build incident response agents"—leading to duplication and lack of standardization. This organizational challenge prompted the creation of a platform-as-a-service (PaaS) approach, built in partnership with Bloomberg's AI organization. The speaker articulated Bloomberg's platform philosophy: they sit in the middle of the spectrum between complete team autonomy (where business units build whatever infrastructure they want) and very tight abstraction (where teams must use prescribed platforms). Bloomberg follows a "golden path" philosophy with enabling teams: > "We want to make easy things extremely easy to do. Uh sorry, the right thing is extremely easy to do and we want to make sure the wrong thing is ridiculous hard to do." This is a mature principle for LLMOps platform engineering—using guardrails and defaults to guide behaviors while maintaining some flexibility. ### Platform Components The paved path platform includes several key components: **AI Gateway**: A centralized gateway enables teams to experiment with different models and quickly determine which works best for their use case. This provides several LLMOps benefits: - Experimentation support for model selection - Visibility into model usage patterns across the organization - Guidance from the platform team on which models fit which problems - Presumably also handles authentication, rate limiting, and cost management (though not explicitly stated) **MCP Directory/Hub**: A tool discovery mechanism where teams can find existing MCP servers. The scenario described is instructive: Team A wants to build something and checks the hub to discover Team B is already building a relevant MCP server, enabling collaboration rather than duplication. This is critical for organizational scaling of LLMOps—preventing fragmentation of effort. **Tool Creation and Deployment Platform**: A standardized PaaS for creating and deploying MCP servers that handles: - Complete software development lifecycle (SDLC) support - Runtime environment provisioning - Authentication and authorization ("taking care of all the auth and side of things") - Reduced friction for deployment **Tiered Access Control**: An interesting governance model distinguishes between proof-of-concept work and production deployment: - **Demo/POC access**: Made deliberately easy to encourage creativity and ideation. The platform believes "creativity come from some freedom of try different new things." - **Production access**: Requires quality controls because "stability and system reliability is at the core of our business." This tiered approach is a sophisticated LLMOps pattern that balances innovation with safety—a common challenge in regulated industries like finance. ## Organizational Change Management Beyond technical infrastructure, the speaker emphasized organizational strategies that proved effective for adoption: ### Training Integration Bloomberg has a well-established training program (20+ years old) with onboarding tailored to entry level and senior level engineers. They integrated AI coding directly into this onboarding program, showing new hires how to utilize AI tools with Bloomberg's principles and technologies. The speaker identified this as "extremely effective" for several reasons: - New employees learn the "new way" from day one - When they join their teams, they challenge existing practices ("hey why don't we do that") - This creates grassroots pressure on senior engineers to adopt new approaches - It serves as an effective change agent for any initiative the organization wants to push This is a clever LLMOps adoption strategy—using natural organizational turnover and onboarding to shift culture rather than fighting against established practices. ### Community-Based Enablement Bloomberg leverages existing organizational structures called "champ programs" and "guild programs"—cross-organizational tech communities where people with similar interests collaborate. These have existed for over 10 years. They bootstrapped an "Engineer AI Productivity Community" two years ago by leveraging these existing structures. Results included: - Organic deduplication of efforts through community awareness - Shared learning across teams - Boosted inner-source contributions - Support for "visiting engineer" programs where engineers temporarily join other teams to get specific work done (solving prioritization mismatches) This community-driven approach to LLMOps adoption is particularly relevant for large organizations where top-down mandates often fail. ### Leadership Enablement An interesting data point emerged: individual contributors had "much better stronger adoption than our leadership team." The speaker's analysis is insightful—technical leaders and managers in the age of AI lack sufficient experience to guide teams effectively. Their previous experience remains valuable but has gaps for AI-driven development. Bloomberg's response was rolling out leadership workshops to ensure leaders have the knowledge needed to drive technical innovation. This addresses a common LLMOps challenge: if leadership doesn't understand the tools, they can't make informed decisions about investment, prioritization, or risk management. ## Critical Assessment and Balanced Perspective Several aspects of this case study warrant critical examination: **Claims vs. Evidence**: While the speaker described various initiatives and approaches, concrete quantitative results were largely absent. We don't have specific metrics on uplift agent success rates, incident response time improvements, or cost savings. The most specific quantitative claim was qualitative—that measurements "dropped pretty quickly" beyond greenfield use cases, and that open pull requests and time-to-merge increased. The presentation is more about strategy and approach than proven outcomes. **Complexity Challenges**: The speaker acknowledged but didn't fully address the verification and testing challenges. For code uplift agents to be truly valuable at scale, they need high confidence in correctness. The admission that patches "can sometimes be difficult to apply" without good tests and linters suggests this remains a work in progress. **Adoption Bottlenecks**: The increased time-to-merge is a significant concern. If AI generates code faster than teams can review and merge it, the overall system throughput may not improve as much as expected. The case study doesn't describe how they're addressing this bottleneck—are they building AI-powered code review tools? Changing review processes? **Platform Adoption**: While the platform approach is architecturally sound, the presentation doesn't provide adoption metrics. How many teams are using the AI gateway? How many MCP servers have been deployed? Without this, it's hard to assess whether the platform actually achieved its anti-fragmentation goals. **Cost Function Claims**: The closing remarks about AI "changing the cost function of software engineering" are intriguing but speculative. The speaker suggested some work becomes cheaper and some more expensive, enabling different trade-off decisions, but didn't provide concrete examples of decisions that changed as a result. ## Technical Depth and LLMOps Maturity Despite these limitations, the case study demonstrates several markers of LLMOps maturity: **Infrastructure-as-Enabler**: Rather than just deploying tools, Bloomberg built infrastructure to enable safe, scalable deployment. The AI gateway, MCP hub, and deployment platform represent investment in sustainable LLMOps. **Separation of POC and Production**: The tiered access model shows understanding that experimentation and production have different requirements—a lesson many organizations learn painfully. **Tool Integration via MCP**: The use of Model Context Protocol servers to connect LLMs to existing systems (metrics, logs, topology, etc.) represents a modern architectural pattern for agentic systems. This is more sophisticated than simple prompt engineering. **Organizational Design**: Recognizing that technology alone doesn't drive adoption, and investing in training, community, and leadership enablement, shows systems thinking about LLMOps. **Focus on Verification**: While they haven't solved it, explicitly calling out the verification challenge shows awareness of a critical LLMOps concern. Many organizations ignore this until production incidents occur. **Honest Assessment of Limitations**: The speaker's willingness to discuss where AI tools didn't deliver value (beyond greenfield scenarios) and created new problems (PR review bottlenecks) suggests a realistic rather than hyperbolic assessment. ## Model and Technology Choices The presentation is notably light on specific model choices, which is both a strength and weakness. The AI gateway approach suggests they're model-agnostic and possibly using multiple models for different tasks, but we don't know: - Which models are used for code uplift vs. incident response - Whether they're using proprietary models (OpenAI, Anthropic) or open-source alternatives - How they're handling model evaluation and selection - What their approach to prompt engineering is - Whether they're fine-tuning models on internal codebases This abstraction may be deliberate—focusing on platform patterns rather than specific model choices that will quickly become outdated. However, it limits the technical transferability of the case study. ## Implications for the Field The most valuable contribution of this case study may be its focus on organizational patterns for LLMOps at scale rather than specific technical solutions: **Platform thinking is essential**: In large organizations, uncoordinated adoption creates waste. Investment in shared infrastructure pays dividends. **The problem shifts**: As the speaker noted, AI doesn't just make everything faster—it changes which activities are expensive vs. cheap, creating a need to reconsider fundamental trade-offs. **Verification remains hard**: The industry hasn't solved automated verification of AI-generated code, making this a key area for continued investment. **Human bottlenecks matter**: If AI can generate faster than humans can evaluate, the bottleneck shifts but overall velocity may not improve proportionally. **Cultural change is as important as technology**: Bloomberg's investments in training, communities, and leadership enablement may be as critical as their technical platform. The speaker's closing invitation to "get back to some of the basic principles" and ask "what is high quality software engineering and how can we use AI tool for that purpose" is perhaps the most important takeaway. Rather than letting tools dictate practices, Bloomberg is attempting to start from engineering principles and apply AI strategically. Whether they've succeeded remains to be fully demonstrated, but the approach is sound. ## Conclusion This case study presents a thoughtful, strategic approach to enterprise LLMOps focused on platform engineering, organizational enablement, and targeted use cases rather than broad adoption of code generation tools. Bloomberg's focus on automation of maintenance work, incident response, and their investment in shared infrastructure represents mature thinking about AI in production. However, the case study would be strengthened by more concrete results data, deeper technical detail on implementation, and honest assessment of ROI. The organizational patterns described—particularly the paved path philosophy, tiered access, and integration with training—are likely to be more broadly applicable than specific technical choices.

Start deploying reproducible AI workflows today