Merantix has implemented AI systems that focus on human-AI collaboration across multiple domains, particularly in pharmaceutical research and document processing. Their approach emphasizes progressive automation where AI systems learn from human input, gradually taking over more tasks while maintaining high accuracy. In pharmaceutical applications, they developed a system for analyzing rodent behavior videos, while in document processing, they created solutions for legal and compliance cases where error tolerance is minimal. The systems demonstrate a shift from using AI as mere tools to creating collaborative AI-human workflows that maintain high accuracy while improving efficiency.
This case study is drawn from a CTO event presentation featuring speakers from Merantix Momentum (the professional services arm of the Merantix AI ecosystem) and Red Tech Cortex. The presentation focuses on AI applications in pharmaceutical and healthcare contexts, with particular emphasis on human-AI collaborative systems that progressively automate complex annotation and review tasks.
Merantix Momentum describes itself as having completed over 150 AI projects spanning strategy consulting, corporate research, and custom solution development. The healthcare and pharmaceutical industry is highlighted as particularly interesting because it allows for the application of a full toolbox of AI methods across the entire value chain, from drug development to precision medicine to diagnostics.
The primary technical example presented is a system built for Boehringer Ingelheim, a major pharmaceutical company. The business problem centers on drug safety assessment, which requires analysts to review large volumes of video footage showing rodents to identify potential toxicity signals that might indicate whether a substance should proceed to human trials.
The challenge is described as a “find Waldo” problem at scale: analysts must detect very rare behaviors across extensive video content. This is traditionally a sequential, labor-intensive task requiring sustained human attention.
The solution implements a human-in-the-loop active learning system with the following workflow:
This represents a shift from tools where “humans use AI” to systems where “humans co-create with AI.” The architecture is designed to learn from the interaction patterns and feedback loops, not just the final labels.
The presentation emphasizes that these interactive learning systems are not unique to single use cases. Merantix is exploring how to build foundation models that capture patterns across multiple experiments and datasets, enabling faster fine-tuning for new intents. This is presented across three modalities:
A second major application area discussed is document review in legal and compliance contexts. The key constraint here is that errors are not allowed, making full automation inappropriate. The same progressive automation mechanism used for video data is applied to documents:
This represents a pragmatic approach to LLMOps where the production system is designed from the start to handle uncertainty and maintain human oversight rather than attempting full automation prematurely.
The presentation introduces a shift toward “prescriptive AI” in pharmaceutical contexts. An example given is a social media listening tool built for pharmacovigilance, where pharmaceutical companies must monitor social media for side effect reports when drugs are released to market.
However, the more interesting trend described is moving beyond trend detection and forecasting toward scenario-based decision planning. This involves:
The speaker notes this is transitioning from theoretical academic approaches to real-world production applications.
The second speaker presents Red Tech Cortex, described as one of the leading European AI platforms with approximately 2 million users. They were ranked 14th in best term software companies and 10th in best AI software globally in 2025.
The platform enables users to create AI agents for knowledge work. The core use case is connecting to enterprise data sources (Google Drive, OneDrive, Slack, and other databases) and enabling semantic search and question-answering across this content. Key technical features include:
The platform also supports web search capabilities, with use cases in legal research for searching case law across different sources.
The speaker highlights several production-relevant features:
Notable customers mentioned include Miele (described as one of the biggest producers in Germany) and various consulting companies.
The presentation provides useful examples of human-AI collaborative systems in production, but several caveats should be noted:
That said, the architectural patterns described (progressive automation, human-in-the-loop active learning, semi-automated systems for high-stakes domains) represent pragmatic approaches to deploying AI in production environments where reliability and human oversight remain critical. The emphasis on not aiming for full automation in contexts where errors are not allowed reflects mature thinking about appropriate AI deployment patterns.
The presentation surfaces several themes relevant to production AI systems:
The overall message emphasizes that there are many production AI systems beyond the typical “agents and MCP” paradigm that receive most attention, and that careful attention to human-AI interaction design is essential for critical applications.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Martin Der, a data scientist at Xomnia, presents practical approaches to GenAI governance addressing the challenge that only 5% of GenAI projects deliver immediate ROI. The talk focuses on three key pillars: access and control (enabling self-service prototyping through tools like Open WebUI while avoiding shadow AI), unstructured data quality (detecting contradictions and redundancies in knowledge bases through similarity search and LLM-based validation), and LLM ops monitoring (implementing tracing platforms like LangFuse and creating dynamic golden datasets for continuous testing). The solutions include deploying Chrome extensions for workflow integration, API gateways for centralized policy enforcement, and developing a knowledge agent called "Genie" for internal use cases across telecom, healthcare, logistics, and maritime industries.
Tabs, a vertical AI company in the finance space, has built a revenue intelligence platform for B2B companies that uses ambient AI agents to automate financial workflows. The company extracts information from sales contracts to create a "commercial graph" and deploys AI agents that work autonomously in the background to handle billing, collections, and reporting tasks. Their approach moves beyond traditional guided AI experiences toward fully ambient agents that monitor communications and trigger actions automatically, with the goal of creating "beautiful operational software that no one ever has to go into."