## Overview
This case study presents LinkedIn's journey in building and scaling production AI agents, presented at what appears to be a conference (likely LangChain's Interrupt conference based on context). The speaker discusses two dimensions of scale: the traditional engineering scale concerning performance and data processing, and organizational scale regarding how to democratize and accelerate agentic AI adoption across a large enterprise. LinkedIn's flagship production agent, the Hiring Assistant, serves as the primary example, designed to help recruiters automate parts of their workflow and spend more time on meaningful candidate conversations.
## The Hiring Assistant: LinkedIn's First Production Agent
The Hiring Assistant exemplifies what the speaker calls the "ambient agent pattern." In the demonstrated workflow, a recruiter describes a job they want to fill (e.g., an experienced growth marketer) and attaches relevant documents. The agent then automatically generates qualifications based on the input and supplementary materials. Critically, the agent operates asynchronously—it informs the recruiter that work is being done in the background and notifies them when results are ready. Candidates can then be reviewed in a detailed list view. This asynchronous, background-processing approach is central to LinkedIn's agent architecture philosophy.
Under the hood, the Hiring Assistant uses a supervisor multi-agent architecture where a supervisor agent coordinates between different sub-agents. Each sub-agent can interact with existing LinkedIn services and systems via what LinkedIn calls "skills"—an extended form of tool calling that supports both synchronous and asynchronous execution.
## The Strategic Pivot to Python
LinkedIn's technology stack historically relied heavily on Java for business logic, with Python relegated to internal tooling and big data applications (like PySpark jobs). When generative AI emerged in late 2022, the initial approach was to continue using Java for GenAI applications. This worked for basic applications with simple prompts and minimal complexity.
However, this approach encountered significant friction as teams wanted to experiment with Python for prompt engineering and evaluations but were forced to build production services in Java. The fundamental problem was the language gap between LinkedIn's production stack and the Python-dominated open-source GenAI ecosystem. This made it difficult to innovate and iterate with the latest techniques, models, and libraries—which were being released at a rapid pace.
LinkedIn made several key observations that drove their strategic decision:
- Undeniable interest in generative AI existed across diverse teams and verticals
- The Java-Python setup was creating friction for GenAI development specifically
- Python was unavoidable for staying current with industry trends and open-source innovations
The bold decision was to adopt Python for business logic, engineering, evaluation, and essentially everything needed for production GenAI applications. Beyond just adopting Python, LinkedIn built a framework to make it the default, removing guesswork about best practices.
## The Application Framework
LinkedIn's service framework combines Python with gRPC for service communication, and LangChain and LangGraph for modeling core business logic. The choice of gRPC was driven by several factors: built-in streaming support (essential for LLM applications), binary serialization for performance, and native cross-language features (important given LinkedIn's polyglot environment).
The framework provides standard utilities for:
- Tool calling with standardized interfaces
- Large language model inference using LinkedIn's internal inferencing stack
- Conversational memory and checkpointing
LangChain and LangGraph form the core of each GenAI application at LinkedIn. The speaker emphasized several reasons for choosing these libraries:
The frameworks are notably easy to use—even Java engineers could pick them up quickly due to intuitive syntax. Through community integrations like the community MCP implementation and pre-built react agents from the LangGraph repository, teams could build non-trivial applications in days rather than weeks. Across LinkedIn's many GenAI teams, this represents significant time savings.
Perhaps more importantly, LangChain and LangGraph provide sensible interfaces that allow LinkedIn to model their internal infrastructure. For example, the chat model interface allows teams to switch between Azure OpenAI and on-premises large language models with just a few lines of code—a critical capability for a company that uses multiple model providers.
## Agent Platform Infrastructure
While the application framework handles building individual agents, LinkedIn needed additional infrastructure to coordinate multiple agents working together. Two core problems were identified:
- Agents can take significant time to process data, requiring support for long-running asynchronous flows (the ambient agent pattern)
- Agents may execute in parallel with dependencies between outputs, requiring proper ordering and coordination
### Messaging System for Agent Communication
LinkedIn modeled long-running asynchronous flows as a messaging problem, extending their existing robust messaging infrastructure (which serves millions of members daily) to support agentic communication. This includes agent-to-agent messaging and user-to-agent messaging. They built nearline flows for automatic retry of failed messages through a queuing system.
### Layered and Scoped Memory
LinkedIn developed agentic memory that is both scoped and layered, consisting of:
- **Working memory**: For immediate, new interactions
- **Long-term memory**: Populated over time as agents have more interactions with users
- **Collective memory**: Shared knowledge across agents
This hierarchical approach allows agents to access different types of memory depending on their needs, with memory being populated progressively based on interaction history.
### Skills: Extended Function Calling
LinkedIn developed the concept of "skills" which extends traditional function calling in important ways. Unlike local function calls, skills can be RPC calls, database queries, prompts, or even other agents. Critically, agents can invoke skills both synchronously and asynchronously—the asynchronous capability being essential for the ambient agent pattern.
The speaker noted that this design predates and is conceptually similar to the Model Context Protocol (MCP), suggesting LinkedIn was ahead of industry trends in this area.
### Centralized Skills Registry
To enable discovery and sharing of capabilities across teams, LinkedIn implemented a centralized skills registry. Team services can expose and register skills in this central location, allowing agents to discover and access skills developed by other teams.
A sample flow works as follows: a supervisor agent tells a sourcing agent it needs help searching for a mid-level engineer. The sourcing agent contacts the skill registry, which responds with an appropriate skill. The sourcing agent then executes this skill. This centralized approach facilitates cross-team collaboration and capability sharing.
### Custom Observability
Though the speaker could not go into detail due to time constraints, LinkedIn built custom observability solutions specifically for agentic execution patterns. Traditional observability approaches are insufficient for the non-deterministic, distributed nature of agent workflows.
## Results and Adoption
The framework and infrastructure have achieved notable adoption:
- Over 20 teams use the framework
- Over 30 services have been created to support GenAI product experiences
- The speaker suggests these numbers may be conservative estimates
The Hiring Assistant represents LinkedIn's first production agent, with the infrastructure designed to support many more.
## Key Lessons and Best Practices
The speaker emphasized two main takeaways:
**Invest in Developer Productivity**: Given the rapid pace of change in the GenAI space, organizations need to make it easy for developers to build and adapt. This means standardizing patterns and lowering barriers to contribution. LinkedIn's framework approach exemplifies this—by providing sensible defaults and standard utilities, teams can focus on their specific use cases rather than infrastructure concerns.
**Maintain Production Software Discipline**: Despite the novel nature of agentic systems, standard software engineering practices remain essential. Availability, reliability, and especially observability are paramount. Robust evaluations are critical for dealing with non-deterministic workloads inherent to LLM-based systems. The speaker emphasized that "you can't fix what you can't observe."
## Critical Assessment
This presentation provides valuable insights into enterprise-scale LLM operations, though several aspects warrant balanced consideration. The speaker is clearly advocating for LinkedIn's approach and the tools they selected, so some claims should be viewed through that lens. The success metrics (20+ teams, 30+ services) are impressive but lack detail on outcomes, reliability metrics, or user satisfaction.
The decision to standardize on Python, while strategically sound for GenAI specifically, represents a significant architectural shift for a company with deep Java roots. The long-term maintenance and integration implications of running a parallel Python stack alongside existing Java infrastructure are not discussed.
The skills registry and messaging infrastructure appear well-designed, though the operational complexity of managing distributed, asynchronous agent systems at scale likely presents challenges not covered in this presentation. The mention of custom observability without detail suggests this is indeed a complex area requiring significant investment.
Overall, this case study demonstrates a thoughtful, engineering-driven approach to scaling LLM-based agents in a large enterprise, with practical solutions to real infrastructure challenges.