LinkedIn: Building an Enterprise-Grade AI Agent for Recruiting at Scale

Company

Title

Building an Enterprise-Grade AI Agent for Recruiting at Scale

Industry

Link

https://www.linkedin.com/blog/engineering/ai/how-we-engineered-linkedins-hiring-assistant

Year

2025

Summary (short)

LinkedIn developed Hiring Assistant, an AI agent designed to transform the recruiting workflow by automating repetitive tasks like candidate sourcing, evaluation, and engagement across 1.2+ billion profiles. The system addresses the challenge of recruiters spending excessive time on pattern-recognition tasks rather than high-value decision-making and relationship building. Using a plan-and-execute agent architecture with specialized sub-agents for intake, sourcing, evaluation, outreach, screening, and learning, Hiring Assistant combines real-time conversational interfaces with large-scale asynchronous execution. The solution leverages LinkedIn's Economic Graph for talent insights, custom fine-tuned LLMs for candidate evaluation, and cognitive memory systems that learn from recruiter behavior over time. The result is a globally available agentic product that enables recruiters to work with greater speed, scale, and intelligence while maintaining human-in-the-loop control for critical decisions.

## Overview LinkedIn's Hiring Assistant represents a comprehensive case study in building enterprise-grade AI agents for production use at massive scale. Published in October 2025, this blog post details the engineering journey of taking LinkedIn's first agentic product from charter phase to global availability. The system is designed to transform recruiting by handling the repetitive, high-volume aspects of candidate sourcing, evaluation, and engagement across LinkedIn's network of 1.2+ billion professional profiles, thereby freeing recruiters to focus on relationship-building and strategic decision-making. The case study is particularly valuable from an LLMOps perspective because it tackles the full lifecycle of production LLM deployment: from architecture design and model customization to real-time inference, asynchronous execution at scale, continuous learning, and robust quality assurance. LinkedIn explicitly positions this not just as building an AI system, but as building a product—one that must deliver consistent value, maintain trust, adapt over time, and operate reliably in enterprise environments. ## Core Problem and Business Context Recruiters face a fundamental challenge: their work combines high-value decision-making with large volumes of repetitive pattern-recognition tasks. Specifically, candidate sourcing and evaluation are the most resource-intensive parts of the recruiting workflow. These activities require reviewing vast numbers of profiles, assessing qualifications, crafting outreach messages, and managing candidate pipelines—all tasks that consume time that could be better spent on strategic hiring decisions and human connection. LinkedIn identified three core capabilities needed for an effective recruiting agent: value delivery at scale (sourcing and evaluating candidates across billions of profiles with enterprise-grade throughput and reliability), interactive communication (understanding recruiter intent through natural dialogue and adapting behavior in real time), and continuous learning (improving over time by incorporating recruiter feedback and behavioral signals). These requirements drove the technical architecture and LLMOps approach. ## Architecture: Plan-and-Execute Agent Design A critical architectural decision was choosing between a ReAct-style architecture and a Plan-and-Execute architecture. While ReAct is simpler and more flexible for tasks requiring dynamic adaptation, LinkedIn found it insufficient for enterprise-grade agent deployment. The key challenges with pure ReAct included unreliable instruction-following by LLMs, hallucinations that could damage trust, and difficult tradeoffs between intelligence (test-time compute) and latency. LinkedIn adopted a Plan-and-Execute architecture that separates agent reasoning into two cycles within a larger loop. The Planner performs high-level reasoning to produce a structured, task-specific plan (the "divide" phase), while the Executor runs the plan step-by-step using a ReAct-style loop for tool use and local reasoning (the "conquer" phase). This approach delivers several advantages for production LLM systems: tasks become well-scoped and less error-prone, execution is more efficient, the architecture supports optimization for cost and latency independently, and task completion rates remain high even as problem complexity grows. The Plan-Execute model is better suited when the problem space can be divided into tasks with clear instructions requiring reliable handling, whereas ReAct excels when agents must adapt to truly unexpected situations with complex interplay between reasoning and actions. ## Agent Infrastructure and Platform Hiring Assistant is built as a plan-and-executor agent on top of a message-driven agent lifecycle platform. Unlike a single shared agent serving many users, the system models separate agent instances for each recruiter. Every Hiring Assistant instance has its own identity and mailbox, with lifecycle orchestration handled entirely through asynchronous message passing. This design enables massive scale while maintaining personalization and isolation. The system depends on a suite of headless tools that mirror what human recruiters use—Recruiter Search, project management, candidate/job management—but exposed in forms that agents can programmatically interact with. Some capabilities are further decomposed into sub-agents for modularity and abstraction, though these are implemented as tools rather than separate agent identities for efficiency. At the center is the supervisor agent, acting as the "central nervous system" and serving as both planner and orchestrator. It interprets user input, evaluates context, and delegates tasks to the right tools or sub-agents. This plan-and-execute loop allows the system to triage recruiter requests, manage workflows, and deliver results reliably at scale. The supervisor is responsible for hiring workflow management, message orchestration, task prioritization, agent coordination, environmental observation (monitoring changes like new candidate activity), and human-in-the-loop management. ## Dual UX Model: Interactive and Asynchronous LinkedIn designed Hiring Assistant with two complementary user experience modes that address different aspects of the LLMOps challenge. The interactive UX allows recruiters to converse directly with the agent, helping clarify hiring requirements, align on expectations, and adjust behavior before large-scale execution begins. This mode provides transparency into what the agent is doing and how it reasons, creating fast feedback loops that help avoid wasted effort. It feels responsive and collaborative, like working with a trusted teammate. The asynchronous UX enables massive background execution once alignment is reached. This is where the agent delivers scalable outcomes: running sourcing and evaluation jobs across millions of profiles, continuously refreshing candidate pipelines, and evaluating new applicants without human micromanagement. Importantly, humans remain in the loop for all decisions they are accountable for, while the agent removes much of the toil. This "source while you sleep" capability frees recruiters for high-value work. From an LLMOps perspective, this dual-mode design addresses a fundamental challenge in production LLM systems: balancing the need for trust and alignment (achieved through synchronous interaction) with the imperative for scale and efficiency (delivered through autonomous background processing). The architecture enables both modes to coexist, each serving distinct but complementary purposes in the overall workflow. ## Real-Time Integration and Event-Driven Architecture Traditional request-response models are insufficient for an agent that manages both conversational interactions and long-running workflows. LinkedIn implemented a push-based, event-driven architecture using a publish-subscribe model. UI updates, notifications, and task changes are delivered asynchronously via session-level real-time channels. The SDK subscribes to agent-specific topics and updates the UI automatically. Long-running LLM tasks use streaming and partial responses, allowing recruiters to observe reasoning, track task progress, and preview results as soon as they become available. This addresses a common LLMOps challenge: managing user expectations and maintaining engagement during operations that may take seconds or minutes. Cross-session synchronization ensures that updates propagate consistently across devices and sessions, so actions like task completion are reflected everywhere. The system delivers strongly typed view models containing the data and metadata needed to render UI components dynamically through decorated view models. This enables modular components that can render from inputs or lazy-loaded related entities without additional round trips. The approach aligns with server-driven UI (SDUI) and agent-driven UI (ADUI) patterns, creating scalable, adaptive experiences across product surfaces. ## Specialized Sub-Agents Hiring Assistant's capabilities are powered by specialized sub-agents, each handling a specific part of the recruiting workflow: The **Intake Agent** gathers and refines hiring requirements and role details from recruiters. It confirms key attributes like job title, location, and seniority, inferring missing information when needed. It generates role-specific qualifications from detailed inputs, past successful projects, LinkedIn Economic Graph insights, or world knowledge, ensuring quality and alignment with best practices for downstream sourcing and evaluation. This upfront investment in understanding requirements is critical for all subsequent agent operations. The **Sourcing Agent** generates multiple search queries based on hiring requirements and evaluation criteria. It leverages and enhances LinkedIn Recruiter Search and Recommended Matches tools, running queries at scale, storing potential candidate profiles, and iteratively refining queries based on performance. This adaptation to changing talent supply and demand represents a key learning capability. The agent draws on LinkedIn's Economic Graph—a digital representation of the global economy—which provides deep understanding of talent supply, demand, and movement. This enables advanced strategies like identifying top locations for talent, spotting active or recently hired candidates, uncovering talent flows where the company is winning or losing, surfacing candidates from fast-growing companies or those with layoffs, and highlighting opportunities in top schools or companies with open positions. In advanced scenarios, sourcing pairs with candidate evaluation to create a closed feedback loop where evaluation signals refine sourcing strategies, using LLM reasoning to balance precision and liquidity. The **Evaluation Agent** assesses candidates by synthesizing information from multiple sources including profiles, resumes, and historical engagement data. It applies hiring requirements and evaluation rubrics to produce structured recommendations with reasoning and evidence. Recruiters remain in the loop to review insights and make final decisions on advancing candidates. LinkedIn had to solve several key LLMOps challenges for evaluation: alignment (ensuring recruiters review qualifications before evaluation and running safety checks for Responsible AI compliance), explanation (surfacing supporting evidence so recruiters can make informed decisions), accuracy (developing quality benchmarks to test across evaluation scenarios), scalability (developing custom LLMs optimized for qualification evaluation), and latency (using techniques like speculative decoding to evaluate candidates in seconds for responsive, conversational experiences). The custom fine-tuned models deliver a combination of accuracy and scale not achievable with off-the-shelf models. The **Candidate Outreach Agent** handles communication with candidates, generating and sending initial outreach and follow-up messages across multiple channels. It replies to candidate questions based on hiring requirements and FAQs defined during intake, and can schedule phone screens directly through messaging. The **Candidate Screening Agent** supports the screening process by preparing tailored screening questions based on hiring requirements and candidate profiles. It can observe, transcribe, and summarize conversations while capturing candidate insights and notes. Critically, candidates can connect with human recruiters directly at any time, and recruiters can take over or guide the screening process whenever needed, maintaining human control over sensitive interactions. The **Learning Agent** continuously refines hiring requirement personalization by analyzing recruiter actions such as adding candidates to pipelines or sending InMails. It updates qualifications and candidate recommendations dynamically by integrating both explicit feedback and implicit signals from recruiter behavior. Importantly, any recommendations around changes to qualifications are surfaced asynchronously to recruiters and applied only after their review and approval. This ensures the agent adapts over time while keeping recruiters in control, improving alignment with preferences and optimizing candidate sourcing efficiency. From an LLMOps perspective, this represents a production implementation of learning from human feedback at scale. The **Cognitive Memory Agent** provides persistent, context-aware memory to support personalized interactions, enabling the system to adapt recommendations and workflows over time in a scalable way. All memory data remains scoped to the recruiter's environment and is never used for training LLMs. Customers retain control over their stored memory with robust privacy and management options. This addresses critical enterprise concerns about data governance and model training in production LLM systems. ## Custom LLM Development and Optimization A particularly significant LLMOps aspect of Hiring Assistant is LinkedIn's development of custom LLMs optimized specifically for candidate evaluation tasks. This represents a strategic decision to move beyond off-the-shelf foundation models when task-specific requirements around accuracy, scale, and latency cannot be met by general-purpose systems. LinkedIn developed quality benchmarks to test the evaluation agent across a range of different scenarios, allowing them to evaluate new versions even before reaching users. The custom models are fine-tuned specifically for the task of evaluating qualifications against candidate profiles. To achieve acceptable latency for conversational experiences, LinkedIn employs speculative decoding within their custom LLM serving infrastructure, enabling the fine-tuned models to evaluate candidates in seconds rather than minutes. This approach illustrates a mature LLMOps strategy: start with general-purpose models for prototyping and early validation, but invest in custom model development when production requirements (particularly around the combination of accuracy, scale, and latency) cannot be met otherwise. The case study demonstrates that for enterprise-grade agent systems handling high-volume, specialized tasks, custom model development can be justified by the operational and user experience benefits. ## Continuous Learning and Personalization Learning is positioned as the foundation of intelligence in Hiring Assistant. The system goes beyond simply following recruiter instructions to actually learning from both what recruiters say and what they do. Every click, shortlist, or outreach becomes a signal that sharpens the agent's understanding of role-specific nuances, preferences, and recruiter style. Initially, the agent relies on explicit guidance and feedback. But with every project, it observes and adapts, gradually picking up the recruiter's taste, judgment, and decision-making patterns. For existing Recruiter users, the agent can learn from past activities ahead of time, so it's not a stranger even on first interaction. This is enabled by machine learning tools that analyze recruiter activity data both asynchronously and in real time. These tools separate signal from noise, uncover meaningful patterns, and adapt recommendations accordingly. The cognitive memory agent recalls relevant past interactions or previously learned patterns, ensuring decisions are grounded in accumulated knowledge. The result is that each recruiter gets a personalized Hiring Assistant that develops a shared rhythm over time: the agent anticipates preferences, adapts to feedback, and evolves into a trusted collaborator. This represents a production implementation of preference learning and behavioral adaptation at scale, critical capabilities for enterprise LLM systems that must work across diverse users and use cases. ## Holistic Quality Framework LinkedIn developed a holistic quality framework built on two pillars: product policy and human alignment. This framework recognizes that in the AI era, quality extends beyond traditional system reliability to include the quality of content an agent produces, how well it aligns with user goals, and whether it behaves responsibly. **Product policy** provides the rails—setting boundaries for safety, compliance, legal standards, and expected agent behavior. These policies establish the minimum quality bar and ensure the agent stays on track while leaving room for autonomy and innovation. LinkedIn uses these policies to drive LLM-powered judges that evaluate quality, sometimes without references (for coherence) and sometimes against grounded data (for factual accuracy). **Human alignment** provides the compass—ensuring the agent is always oriented toward real hiring outcomes. The engineering process is grounded in human-validated data, including both annotated datasets and real recruiter activity. For example, if a recruiter engages with a candidate, this is treated as a strong positive signal. Over time, the agent learns to recommend more candidates matching these recruiter-validated patterns. Human alignment not only improves outcomes but also validates product policy, keeping the framework tied to real-world effectiveness. This two-pronged approach to quality—rails (policy) and compass (alignment)—ensures the agent doesn't just work reliably but continuously evolves into a more intelligent, trustworthy, and valuable recruiting partner. From an LLMOps perspective, this represents a mature approach to production LLM quality assurance that balances guardrails with outcome optimization. ## Data Governance and Responsible AI Throughout the case study, LinkedIn emphasizes several critical aspects of responsible AI and data governance in production LLM systems. The agent never uses customer data across customer boundaries, maintaining strict enterprise-grade data isolation. All memory data remains scoped to the recruiter's environment and is never used for training LLMs. Customers retain control over stored memory with robust privacy and management options. Safety checks are run against each qualification during the intake process to ensure compliance with Responsible AI policies. Humans remain in the loop for all decisions they are accountable for, while the agent removes toil. The system is designed with transparency, showing what the agent is doing and how it reasons about tasks. These design choices reflect a mature understanding that production LLM systems, particularly in sensitive domains like hiring, must address concerns around privacy, fairness, transparency, and human control. The architecture and operational choices explicitly support these requirements rather than treating them as afterthoughts. ## Technical Infrastructure Considerations While the case study focuses primarily on agent architecture and capabilities, several infrastructure elements critical to production LLMOps emerge. The system uses GraphQL APIs to connect client SDKs to backend services, providing decorated view models that enable dynamic rendering aligned with server-driven/agent-driven UI architectures. The real-time integration layer handles streaming LLM responses, cross-session synchronization, and asynchronous notification delivery at scale. The message-driven agent lifecycle platform enables massive-scale asynchronous execution while maintaining per-recruiter isolation and personalization. The custom LLM serving infrastructure supports speculative decoding for latency optimization. Quality benchmarking systems enable continuous evaluation of agent versions before production deployment. These infrastructure investments reflect the reality that production LLM systems at scale require substantial platform engineering beyond the models themselves. The case study implicitly demonstrates the importance of viewing LLMOps as a full-stack discipline encompassing model development, serving infrastructure, orchestration platforms, real-time systems, data pipelines, quality assurance, and user experience. ## Lessons and Tradeoffs LinkedIn is transparent about architectural tradeoffs and lessons learned. The decision to use Plan-and-Execute over pure ReAct was driven by reliability concerns with LLMs following instructions consistently, the need to manage hallucination risks in enterprise contexts, and the requirement to optimize cost, latency, and task completion independently. The Plan-and-Execute architecture trades some flexibility for reliability and efficiency, which is appropriate for recruiting workflows where tasks can be well-scoped. The dual UX model trades implementation complexity for the ability to serve both trust-building (interactive) and scale (asynchronous) requirements. Investing in custom LLMs for evaluation trades development effort for the specific combination of accuracy, scale, and latency required for conversational candidate assessment. Building specialized sub-agents trades architectural complexity for modularity, clear separation of concerns, and the ability to optimize each agent independently. The case study notes that this is "just the beginning" of the journey into building enterprise-grade agents, suggesting ongoing learning and iteration. LinkedIn positions their sharing of experiences as contributing to the broader practitioner community working at the intersection of AI, systems, user experience, and trust, acknowledging that the field is still maturing. ## Production Maturity and Scale The case study represents a production system that has reached global availability, indicating significant maturity. The system operates across 1.2+ billion professional profiles, demonstrating genuine enterprise scale. Multiple specialized sub-agents coordinate through a supervisor, handling diverse workflows from intake through screening. The agent learns continuously from recruiter behavior across potentially millions of interactions. Real-time streaming and asynchronous execution coexist in a unified architecture. This level of maturity and scale distinguishes the case study from many LLMOps discussions focused on prototypes or early-stage deployments. LinkedIn has addressed the full range of challenges that emerge when moving from proof-of-concept to production: scale, reliability, latency, cost, quality assurance, data governance, user experience, continuous learning, and organizational alignment. ## Strategic Context and Vision LinkedIn positions Hiring Assistant as continuing their vision and mission of creating economic opportunity for every member of the global workforce by connecting professionals to make them more productive and successful. The system represents a foundation for the next generation of recruiting—one that empowers recruiters to focus on relationship building and making great hires, while making it easier for great candidates to be discovered for roles they might not have found otherwise. The case study emphasizes that success required tight codesign across product, models, infrastructure, and user experience. This cross-functional collaboration enabled delivery of an agent that is both technically robust and truly valuable to customers. The emphasis on codesign reflects a mature understanding that effective LLMOps requires breaking down silos between data science, engineering, product, and design. Overall, LinkedIn's Hiring Assistant case study provides a comprehensive view of what it takes to build, deploy, and operate an enterprise-grade AI agent at massive scale. It demonstrates sophisticated approaches to agent architecture, model customization, quality assurance, continuous learning, and responsible AI—all while maintaining focus on delivering genuine business value to users. The transparent discussion of tradeoffs, challenges, and ongoing learning makes this a particularly valuable contribution to the LLMOps practitioner community.

Start deploying reproducible AI workflows today