ZenML

Read-Only Personal AI for Cognitive Exhaust Analysis

Waypoint AI 2026
View original source

Shimon built a personal AI system called Fulan that analyzes digital activity across six read-only data sources to provide insights about productivity patterns, relationship management, and reading habits without taking autonomous actions. The system deliberately maintains read-only access to prevent AI-generated contamination of behavioral data and eliminate the risk of catastrophic write errors. Using Claude and custom Python scripts, the system cross-references data from email, journals, task managers, browsers, CRM systems, and other sources to generate weekly reflections and actionable recommendations, demonstrating that observer-type AI systems serve fundamentally different needs than autonomous agents while maintaining better security and analytical integrity.

Industry

Tech

Technologies

Overview

Shimon presents Fulan, a personal AI system that takes a fundamentally different architectural approach to the dominant agent-based paradigm in personal AI. Rather than building an autonomous system that acts on the user’s behalf, this system functions as a read-only observer that analyzes what Shimon calls “cognitive exhaust fumes”—the digital byproducts of thinking and working across multiple platforms. The core thesis is that read-only access isn’t a limitation to be overcome but rather a deliberate design choice that enables better analysis while eliminating catastrophic risk.

The system ingests data from six sources with strictly read-only permissions, performs cross-source analysis using Claude, and outputs insights to a separate workspace where the user reviews and acts on recommendations. This architecture addresses three primary use cases: identifying intention-action gaps where stated goals diverge from actual behavior, detecting attention drift across projects and priorities, and recognizing relationship decay through communication pattern analysis.

Technical Architecture and Implementation

The system architecture consists of three distinct zones that maintain strict data flow boundaries. The sources zone contains six read-only data feeds that the AI never writes back to. The workspace zone is where Claude performs the analysis and reasoning. The outputs zone is a separate Obsidian vault where synthesized insights are deposited for human review, though Shimon notes this could be any separate system like Notion or plain text files.

This three-zone architecture is critical to the system’s value proposition. By maintaining read-only access to sources and segregating outputs, the system prevents what Shimon calls “cognitive pollution”—the contamination of behavioral data by AI-generated content. Once the AI starts writing to your data sources, you can no longer cleanly observe your own patterns because the exhaust reflects a human-AI hybrid system rather than pure human cognition.

The implementation runs primarily through Claude, specifically leveraging custom skills and commands. The weekly reflection functionality demonstrates the operational pattern. Shimon has created a custom Claude skill that launches Python scripts to retrieve data from all six read-only sources. These scripts query various databases and APIs, collecting information about emails, calendar events, journal entries, browser activity, task management systems, and CRM data.

The Python scripts make structured API calls to Anthropic’s Claude, passing in the collected data along with carefully crafted prompts that specify the desired output format. The system generates markdown documents containing weekly overviews, identified themes and tensions, commitment tracking, relationship analysis, notable moments, and reflection questions. Shimon emphasizes this isn’t a productivity report but rather a reflection on thinking patterns assembled entirely from exhaust data.

Cross-Source Intelligence and Use Cases

The compelling value proposition emerges from cross-source correlation that no single tool can provide. Email clients don’t know what you journaled, task managers don’t know what you’re browsing, and CRMs don’t know what you’re reading. The cross-source signal is explicitly positioned as the product itself.

A concrete example involves identifying reading discussion partners. Shimon wanted to discuss what he was reading but felt he was over-messaging the same three people. He asked the AI to identify who in his network would be appropriate discussion partners given his recent reading activity. This query combined four data sources: browser history from Vivaldi’s SQLite database showing most-read articles and open tabs, CRM data from Clay identifying people interested in AI, European tech, and education, correlation logic to match reading topics with contact interests, and relationship recency data to avoid over-contacting the same people.

The system uses Claude’s Model Context Protocol capabilities to execute cross-origin queries. Shimon notes that the Clay MCP integration is particularly slow, searching through his friend relationship system to find people interested in topics matching his reading patterns. The analysis consumes significant context window tokens, which Shimon acknowledges is acceptable given Claude 4.6’s million-token context but suggests running such queries in clean sessions to avoid contaminating ongoing conversations.

The output maps specific contacts to specific articles, in one case even identifying the article’s author within Shimon’s own network. This demonstrates intelligence that emerges purely from data correlation across systems that were never designed to communicate with each other, all without sending messages, scheduling meetings, or taking any autonomous actions.

LLMOps Considerations and Production Realities

From an LLMOps perspective, this system reveals several important operational considerations for personal AI deployments. The read-only constraint fundamentally changes the risk profile. Write errors have unbounded downside potential in high-stakes personal domains like relationships, career management, and reputation. A misfired automated email could cause significant damage. Read-only errors have zero downside—you simply ignore incorrect suggestions.

The system relies heavily on structured prompting and output formatting. The Python scripts that feed data to Claude include carefully specified prompts that define the structure and tone of reflections. This represents a classic LLMOps pattern of treating prompts as critical infrastructure that requires versioning and maintenance.

Token management emerges as a practical constraint. Cross-source queries that pull in extensive browser history, CRM data, and relationship information consume substantial context windows. Shimon explicitly acknowledges this limitation and recommends running such queries in fresh Claude sessions to preserve context for other interactions. This reflects real-world LLMOps tradeoffs between analytical depth and resource efficiency.

The system architecture demonstrates interesting patterns around data persistence and state management. Rather than maintaining a centralized database, the system queries source databases on-demand, performs ephemeral analysis in Claude, and persists only the synthesized outputs. This stateless approach simplifies the architecture but means each analysis starts fresh, potentially missing longitudinal patterns unless explicitly captured in output documents.

Integration complexity appears significant though not quantified in detail. The system pulls from Vivaldi browser SQLite databases, Clay CRM via MCP, email systems, calendar applications, journaling tools, and task managers. Each integration point represents potential brittleness—schema changes, API updates, or permission changes could break the data pipeline. Shimon doesn’t discuss error handling, monitoring, or integration maintenance, which likely represents ongoing operational overhead.

Security and Privacy Architecture

Shimon dedicates substantial attention to security considerations, demonstrating thoughtful threat modeling that goes beyond typical AI product discussions. He introduces the mosaic effect, where individually innocuous pieces of information combine to create revealing pictures. The same cross-referencing capability that makes the system valuable makes it a devastating target if compromised.

He references Simon Willison’s “lethal triquetra” security model, which combines three risk factors: private data, untrusted content, and external communications. Shimon initially hoped his read-only architecture would break this triquetra but acknowledges it doesn’t fully succeed. While removing natural exfiltration channels, the system still has shell access and the ability to communicate externally, maintaining the third leg of the triquetra.

The system sends data to Anthropic’s API over networks that Shimon describes as “mostly open” with more information lying around than strictly required. He explicitly doesn’t claim the system is secure, instead claiming he’s thought about where it isn’t secure and consciously decided which risks to accept. This represents mature security thinking—understanding that perfect security is impossible and instead focusing on informed risk acceptance versus ignorant exposure.

The read-only architecture does provide meaningful security benefits beyond just preventing write errors. It reduces the attack surface by eliminating write permissions that could be exploited. It minimizes exfiltration pathways since the AI never sends emails, makes API calls to external services, or takes actions that could leak data. The separation of outputs into a distinct review workspace creates an air gap where humans inspect suggestions before execution.

However, the system still transmits substantial personal data to Anthropic’s API. All six data sources are sent to a third-party service, representing significant privacy exposure. Shimon doesn’t discuss data retention policies, whether API calls use encryption, or whether Anthropic has committed to not training on this data. For organizations or individuals with strict data governance requirements, this external API dependency would likely be unacceptable.

Philosophical Positioning and Market Analysis

Shimon makes a strong philosophical argument that observer systems and agent systems are fundamentally different product categories, not points on a maturity spectrum. The industry generally frames read-only as a limitation you graduate from on the path to “real” autonomous agents. Shimon argues this framing is fundamentally wrong.

Observer systems produce more value per interaction by revealing patterns invisible to single-source tools and to the user themselves. An agent might save 30 seconds on a weather check, but an observer shows you’ve been avoiding your most important project for two weeks. These are qualitatively different value propositions serving different needs.

The agent versus observer framing raises interesting questions about AI product development and LLMOps priorities. Much of the LLMOps tooling ecosystem focuses on enabling autonomous agents—function calling, tool use, multi-step reasoning, error recovery, and action confirmation patterns. Observer systems require different infrastructure: cross-source data integration, pattern recognition across longitudinal data, explanation generation, and insight summarization.

Shimon’s claim that observers aren’t “broken butlers” but rather distinct products challenges the prevailing agent-centric narrative. From an LLMOps perspective, this suggests organizations should evaluate whether they’re building agents because autonomous action is truly needed or because that’s the dominant paradigm. Read-only observer systems might address many use cases with dramatically lower risk profiles and different operational requirements.

Practical Implementation Details

The system uses Cursor, an AI-powered code editor, as part of the workflow, though the exact role isn’t fully specified. Shimon converts output into preview mode in Cursor to review the generated reflections, suggesting Cursor serves as both development environment and review interface.

The Claude skills functionality represents a key integration point. Shimon has created custom skills that encapsulate the logic for weekly reflections and cross-source queries. These skills combine Python scripting with Claude’s natural language interface, allowing plain language queries that automatically activate appropriate analytical workflows. This pattern demonstrates how custom skills can bridge structured scripting with conversational AI interfaces.

The system uses bash scripting for certain operations, which Shimon refers to as “bash sorcery.” He mentions that running Claude with auto mode or dangerous disk permissions can simplify some of this scripting, though this raises security questions. Granting broad disk permissions to an AI system accessing sensitive personal data represents exactly the kind of risk-convenience tradeoff that characterizes LLMOps decision-making.

Database access appears to happen through direct SQLite queries for browser data, MCP protocols for CRM access, and presumably API or file system access for other sources. This heterogeneous integration approach is pragmatic but creates maintenance complexity. Each data source likely requires custom extraction logic, error handling for schema changes, and authentication management.

Limitations and Critical Assessment

While Shimon presents a compelling vision, several limitations and questions emerge from a critical LLMOps perspective. The system appears to be a personal prototype built for individual use rather than a production-grade system designed for reliability, scalability, or multi-user deployment. Error handling, monitoring, logging, and recovery mechanisms aren’t discussed. There’s no mention of testing strategies, validation of insights, or mechanisms to detect when the AI produces incorrect or misleading analysis.

The Clay MCP integration is explicitly described as slow, taking “forever to run.” This suggests performance issues that would be problematic for production systems expecting responsive interactions. Token consumption for cross-source queries is high enough that Shimon recommends running them in clean sessions, indicating resource constraints that might limit analytical depth or query frequency.

The value proposition relies heavily on subjective claims about insight quality. Shimon describes weekly reflections as “occasionally brutal” and shows examples that appear meaningful, but there’s no systematic evaluation of whether the insights are accurate, actionable, or superior to what a human could derive from the same data. The examples shown may represent cherry-picked successes rather than typical output quality.

The security analysis, while thoughtful, acknowledges significant gaps. Data goes to Anthropic’s API, shell access provides exfiltration pathways, and the system isn’t claimed to be secure. For enterprise deployment, these security characteristics would likely be disqualifying. Even for personal use, the privacy implications of sending six comprehensive data sources to a third-party API are substantial.

The read-only constraint, while philosophically interesting, does limit functionality. Users must manually act on all suggestions, which reduces efficiency gains compared to autonomous systems. For routine, low-stakes tasks, the overhead of human review might outweigh the risk reduction benefits. The optimal boundary between read-only observation and autonomous action likely varies by use case, user preference, and risk tolerance.

Broader LLMOps Implications

This case study illuminates several themes relevant to production LLM deployments beyond personal AI. The tension between capability and safety appears throughout. Read-only access dramatically constrains what the system can do but equally dramatically reduces catastrophic failure modes. Organizations deploying LLMs in high-stakes domains face similar tradeoffs between automation benefits and error risks.

The concept of cognitive pollution—where AI-generated content contaminates the data being analyzed—applies broadly. Training data contamination, feedback loops where model outputs become training inputs, and difficulty distinguishing human from AI contributions in collaborative systems all reflect this fundamental challenge. Maintaining clean analytical baselines requires architectural thinking about data flow and contamination vectors.

Cross-source intelligence represents both opportunity and complexity. The most valuable insights often emerge from correlating data across systems never designed to interoperate. But each integration point adds brittleness, each data schema adds transformation logic, and each API adds latency and failure modes. The LLMOps infrastructure required to reliably correlate heterogeneous data sources at scale is substantial.

The observer versus agent framing challenges assumptions about LLM value creation. Much LLMOps tooling assumes autonomous action is the goal. But observation, analysis, and insight generation without autonomous execution might address many use cases with simpler infrastructure, clearer accountability, and lower risk. Organizations should consider whether their LLM deployments truly require autonomy or whether augmented human decision-making suffices.

Prompt engineering emerges as critical infrastructure. The weekly reflection prompts, cross-source query logic, and output formatting specifications represent code that must be developed, tested, versioned, and maintained. Treating prompts as software artifacts requiring engineering discipline is fundamental to reliable LLM operations, yet many organizations still handle prompts as ad-hoc text strings.

Technical Maturity and Production Readiness

Assessing this system’s technical maturity requires distinguishing between the conceptual architecture and the implementation. The three-zone read-only architecture is intellectually coherent and could scale to production systems with appropriate engineering. The actual implementation appears to be a personal prototype with limited reliability engineering, unclear error handling, acknowledged performance issues, and security characteristics unsuitable for sensitive enterprise deployments.

For organizations considering similar approaches, the architectural patterns are more valuable than the specific implementation. Separating read-only data ingestion from analysis workspaces from output repositories creates clear data flow boundaries that support both security and analytical integrity. Treating observer systems as distinct from agent systems rather than intermediate stages enables different optimization criteria and risk profiles.

The reliance on Claude’s API represents both strength and limitation. It enables rapid prototyping without managing LLM infrastructure, but creates vendor dependency, ongoing API costs, and data privacy exposure. Enterprise deployments might prefer self-hosted models despite higher infrastructure costs to maintain data sovereignty.

The system demonstrates that valuable personal AI applications can be built with relatively straightforward integrations—Python scripts, API calls, database queries, and markdown outputs. The complexity lies in crafting effective prompts, designing useful analytical frameworks, and building integrations to diverse data sources. These challenges are primarily about product design and data engineering rather than novel AI capabilities.

Overall, this case study presents an intellectually rigorous alternative to agent-centric personal AI, demonstrating that read-only observer systems can deliver distinct value with fundamentally different risk profiles. While the implementation is clearly a personal prototype rather than production-grade software, the architectural principles and philosophical positioning offer valuable perspectives for organizations deploying LLMs in contexts where observation and analysis are more appropriate than autonomous action.

More Like This

Agentic AI Copilot for Insurance Underwriting with Multi-Tool Integration

Snorkel 2025

Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.

healthcare fraud_detection customer_support +90

Multi-Agent Financial Research and Question Answering System

Yahoo! Finance 2025

Yahoo! Finance built a production-scale financial question answering system using multi-agent architecture to address the information asymmetry between retail and institutional investors. The system leverages Amazon Bedrock Agent Core and employs a supervisor-subagent pattern where specialized agents handle structured data (stock prices, financials), unstructured data (SEC filings, news), and various APIs. The solution processes heterogeneous financial data from multiple sources, handles temporal complexities of fiscal years, and maintains context across sessions. Through a hybrid evaluation approach combining human and AI judges, the system achieves strong accuracy and coverage metrics while processing queries in 5-50 seconds at costs of 2-5 cents per query, demonstrating production viability at scale with support for 100+ concurrent users.

question_answering data_analysis chatbot +49

Context Engineering for Production AI Agents at Scale

Manus 2025

Manus, a general AI agent platform, addresses the challenge of context explosion in long-running autonomous agents that can accumulate hundreds of tool calls during typical tasks. The company developed a comprehensive context engineering framework encompassing five key dimensions: context offloading (to file systems and sandbox environments), context reduction (through compaction and summarization), context retrieval (using file-based search tools), context isolation (via multi-agent architectures), and context caching (for KV cache optimization). This approach has been refined through five major refactors since launch in March, with the system supporting typical tasks requiring around 50 tool calls while maintaining model performance and managing token costs effectively through their layered action space architecture.

code_generation data_analysis visualization +34