ZenML

Policy Search and Response System Using LLMs in Higher Education

NDUS 2024
View original source

The North Dakota University System (NDUS) implemented a generative AI solution to tackle the challenge of searching through thousands of policy documents, state laws, and regulations. Using Databricks' Data Intelligence Platform on Azure, they developed a "Policy Assistant" that leverages LLMs (specifically Llama 2) to provide instant, accurate policy search results with proper references. This transformation reduced their time-to-market from one year to six months and made policy searches 10-20x faster, while maintaining proper governance and security controls.

Industry

Education

Technologies

Overview

The North Dakota University System (NDUS) is a public sector higher education organization comprising 11 institutions including five community colleges, four regional universities, and two research universities. The system serves approximately 80,000 students, faculty, and staff, all governed by the State Board of Higher Education (SBHE). This case study demonstrates how a government-adjacent education organization successfully deployed generative AI capabilities for policy compliance and document search, representing a practical LLMOps implementation in a risk-averse public sector environment.

Business Problem and Context

NDUS faced a critical operational challenge centered on regulatory compliance and policy management. The organization maintains thousands of internal policies, state laws, contracts, procedures, and codes that must be regularly referenced across all 11 institutions. Before implementing their AI solution, staff members spent considerable time—often hours per search—manually wading through pages, references, codes, and contracts to ensure compliance with regulations.

The core issues included:

As Ryan Jockers, Assistant Director of Reporting and Analytics, described: “Finding what you need among all those texts can take hours, and users constantly need to start fresh searches to know what we can and can’t do.” This inefficiency was compounded by the need to search across five different sites just to locate a single document.

Technical Architecture and LLMOps Implementation

NDUS leveraged its existing Azure cloud environment and Databricks relationship to minimize procurement overhead and accelerate deployment. This is a noteworthy approach for organizations in regulated or public sector environments where new vendor relationships can be prohibitively slow to establish.

LLM Selection and Evaluation

The team took a methodical approach to LLM selection, testing multiple open source models on the Databricks Platform. Their evaluation criteria prioritized:

They ultimately selected Llama 2 for their production deployment, though the case study notes they are considering consolidating further with DBRX (Databricks’ own foundation model). This demonstrates a pragmatic approach to model selection—starting with proven open source options while keeping the door open for platform-native alternatives.

Foundation Model APIs

Rather than undertaking complex custom model deployment and hosting, NDUS utilized Databricks Foundation Model APIs to quickly build applications that leverage generative AI. This serverless approach reduced operational complexity and allowed the small team to focus on application development rather than infrastructure management—a key LLMOps consideration for resource-constrained organizations.

Vector Search and RAG Implementation

The Policy Assistant application appears to implement a retrieval-augmented generation (RAG) pattern, though the case study doesn’t use that specific terminology. Key technical components include:

The application enables users to query the system using natural language via an API, receiving responses that include accurate results along with references, page numbers, and direct links to source documents. This citation capability is critical for compliance use cases where users need to verify and audit AI-generated responses.

Governance and Access Control

For an organization handling internal policies and potentially sensitive regulatory information, data governance was a paramount concern. NDUS implemented Unity Catalog to:

This governance layer is essential for production LLM deployments, particularly in regulated environments where auditability and access control are non-negotiable requirements.

ML Operations and Testing

NDUS uses MLflow for managing their ML and GenAI applications. The case study mentions they perform local tests and have established a simple method for running applications—suggesting a workflow that includes local development and testing before production deployment. While details are sparse, the mention of MLflow indicates they are tracking experiments, managing model versions, and likely logging model artifacts in a structured manner.

Production Deployment and Results

The Policy Assistant was deployed as a low-risk initial use case, representing a sensible LLMOps strategy of starting with high-value but lower-stakes applications before expanding to more critical systems. The development timeline of six months from concept to production is notable, representing a 2x improvement over their previous one-year timeline for new data products.

Quantified Outcomes

The case study reports several metrics, though these should be interpreted with appropriate skepticism as they come from a vendor customer story:

The productivity gains from eliminating multi-site document searches appear genuine, though the precise quantification may be optimistic. The infrastructure savings claim is reasonable given their existing vendor relationships.

Expansion and Future Plans

Building on the success of Policy Assistant, NDUS is expanding their GenAI capabilities in several directions:

The organization has also invested in organizational change management, conducting regular educational events to help stakeholders understand how to effectively use AI tools. This human-centered approach to AI adoption is often overlooked in technical implementations but is crucial for realizing value from LLMOps investments.

Critical Assessment

This case study represents a relatively straightforward but practical LLMOps implementation. Several factors make it credible:

However, some claims warrant scrutiny. The “10-20x faster” metric is a wide range that may reflect best-case scenarios rather than typical usage. The case study is also light on details about evaluation frameworks, hallucination mitigation, and ongoing monitoring—all critical LLMOps concerns that may simply not have been included in the marketing-focused write-up.

Overall, this represents a solid example of LLMOps in the education/public sector space, demonstrating that organizations with limited resources can successfully deploy production AI systems by leveraging managed platforms and starting with well-scoped use cases.

More Like This

GenAI Governance in Practice: Access Control, Data Quality, and Monitoring for Production LLM Systems

Xomnia 2025

Martin Der, a data scientist at Xomnia, presents practical approaches to GenAI governance addressing the challenge that only 5% of GenAI projects deliver immediate ROI. The talk focuses on three key pillars: access and control (enabling self-service prototyping through tools like Open WebUI while avoiding shadow AI), unstructured data quality (detecting contradictions and redundancies in knowledge bases through similarity search and LLM-based validation), and LLM ops monitoring (implementing tracing platforms like LangFuse and creating dynamic golden datasets for continuous testing). The solutions include deploying Chrome extensions for workflow integration, API gateways for centralized policy enforcement, and developing a knowledge agent called "Genie" for internal use cases across telecom, healthcare, logistics, and maritime industries.

healthcare customer_support document_processing +31

Multi-Agent AI System for Investment Thesis Validation Using Devil's Advocate

Linqalpha 2026

LinqAlpha, a Boston-based AI platform serving over 170 institutional investors, developed Devil's Advocate, an AI agent that systematically pressure-tests investment theses by identifying blind spots and generating evidence-based counterarguments. The system addresses the challenge of confirmation bias in investment research by automating the manual process of challenging investment ideas, which traditionally required time-consuming cross-referencing of expert calls, broker reports, and filings. Using a multi-agent architecture powered by Claude Sonnet 3.7 and 4.0 on Amazon Bedrock, integrated with Amazon Textract, Amazon OpenSearch Service, Amazon RDS, and Amazon S3, the solution decomposes investment theses into assumptions, retrieves counterevidence from uploaded documents, and generates structured, citation-linked rebuttals. The system enables investors to conduct rigorous due diligence at 5-10 times the speed of traditional reviews while maintaining auditability and compliance requirements critical to institutional finance.

document_processing question_answering structured_output +33

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90