ZenML

Multimodal Art Collection Search Using Vector Databases and LLMs

Actum Digital
View original source

An art institution implemented a sophisticated multimodal search system for their collection of 40 million art assets using vector databases and LLMs. The system combines text and image-based search capabilities, allowing users to find artworks based on various attributes including style, content, and visual similarity. The solution evolved from using basic cloud services to a more cost-effective and flexible approach, reducing infrastructure costs to approximately $1,000 per region while maintaining high search accuracy.

Industry

Media & Entertainment

Technologies

Overview

This case study comes from a presentation by a team member at Actum Digital discussing their implementation of a multimodal search system for an art collection. The transcript quality is poor (likely due to automatic transcription issues), but key technical details can be extracted about how they approached building a production AI search system for a large art catalog containing approximately 40 million assets.

The system appears to serve the needs of art collection managers, auctioneers, and potentially end users who need to discover artworks through various search modalities including text-based search and image similarity search. The use case is particularly interesting because art assets have unique characteristics including multilingual metadata (Spanish, French, etc.), varying titles (including self-portraits with generic naming), materials, techniques, and visual similarities that traditional keyword search cannot adequately capture.

Problem Statement

The team faced several challenges when building a search system for a massive art collection:

Technical Architecture

Infrastructure Components

The solution is built on AWS infrastructure with several key components:

Multimodal Search Capabilities

The system supports multiple search modalities that can be used independently or in combination:

The presentation demonstrates searching for paintings with specific visual characteristics like “the blaze in the middle” and finding artworks with similar visual elements from different angles or styles. This multimodal capability appears to be a core differentiator of their approach.

Cost Considerations and Optimization

Cost optimization was a major focus of this implementation, which is a critical LLMOps concern for production systems:

Initial Cost Analysis

The presenter discusses the cost structure which has two main components:

With 40 million assets requiring embedding, the initial indexing cost was substantial - mentioned as “thousands of dollars” for full reindexing. The ongoing infrastructure cost is approximately $1,000 per month per region.

Optimization Strategies

Several optimization strategies were discussed:

Score Normalization Challenges

A technical challenge highlighted in the presentation involves score normalization when combining results from different search modalities. When mixing text-based search scores with image similarity scores, the raw scores may be on different scales.

The presenter mentions normalizing scores to a range of 0 to 1, and references an article explaining the details of how to approach this. This is a common challenge in hybrid search systems where different retrieval methods produce scores that aren’t directly comparable.

Different normalization techniques may be more appropriate depending on the use case - the presentation suggests that the choice of technique depends on whether you’re doing “ranking documents recommendation” or “more classical problems like financial” applications.

Infrastructure Reliability

An interesting operational concern raised in the Q&A portion relates to infrastructure reliability. The presenter mentions that they run infrastructure in multiple regions for redundancy, noting that when their “first infrastructure went down” they had a backup. This highlights the importance of high availability in production AI systems, especially for business-critical search functionality.

The presenter expresses some frustration that even services marketed as “very reliable” can have issues, emphasizing the need for redundancy and not relying on a single infrastructure deployment for critical business operations.

Lessons Learned and Practical Insights

Several practical insights emerge from this case study:

Technical Demonstration

The presentation included a live demonstration showing:

The demo illustrated searching for artworks with specific visual characteristics like hand positions and finding similar items across the collection with different angles but similar compositional elements.

Caveats and Limitations

It’s important to note some limitations of this case study:

Despite these limitations, the case study provides valuable insights into the practical considerations of building production multimodal search systems at scale, particularly around cost optimization and infrastructure reliability.

More Like This

Scaling AI Product Development with Rigorous Evaluation and Observability

Notion 2025

Notion AI, serving over 100 million users with multiple AI features including meeting notes, enterprise search, and deep research tools, demonstrates how rigorous evaluation and observability practices are essential for scaling AI product development. The company uses Brain Trust as their evaluation platform to manage the complexity of supporting multilingual workspaces, rapid model switching, and maintaining product polish while building at the speed of AI industry innovation. Their approach emphasizes that 90% of AI development time should be spent on evaluation and observability rather than prompting, with specialized data specialists creating targeted datasets and custom LLM-as-a-judge scoring functions to ensure consistent quality across their diverse AI product suite.

document_processing content_moderation question_answering +52

Multi-Agent AI System for Investment Thesis Validation Using Devil's Advocate

Linqalpha 2026

LinqAlpha, a Boston-based AI platform serving over 170 institutional investors, developed Devil's Advocate, an AI agent that systematically pressure-tests investment theses by identifying blind spots and generating evidence-based counterarguments. The system addresses the challenge of confirmation bias in investment research by automating the manual process of challenging investment ideas, which traditionally required time-consuming cross-referencing of expert calls, broker reports, and filings. Using a multi-agent architecture powered by Claude Sonnet 3.7 and 4.0 on Amazon Bedrock, integrated with Amazon Textract, Amazon OpenSearch Service, Amazon RDS, and Amazon S3, the solution decomposes investment theses into assumptions, retrieves counterevidence from uploaded documents, and generates structured, citation-linked rebuttals. The system enables investors to conduct rigorous due diligence at 5-10 times the speed of traditional reviews while maintaining auditability and compliance requirements critical to institutional finance.

document_processing question_answering structured_output +33

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90