Parcha is developing AI agents to automate operations and compliance workflows in enterprises, particularly focusing on fintech operations. They tackled the challenge of moving from simple demos to production-grade systems by breaking down complex workflows into smaller, manageable agent components supervised by a master agent. Their approach combines existing company procedures with LLM capabilities, achieving 90% accuracy in testing before deployment while maintaining strict compliance requirements.
Parcha is an early-stage startup building AI agents that autonomously complete operations and compliance tasks for fintech companies. Founded by AJ Asver, a repeat founder who previously sold Scooper (a real-time search engine) to Google, and his co-founder Miguel (formerly of Twitter and Brex data science teams), the company emerged from their experience at Brex where they led the platform team responsible for automation. They identified the “last mile of automation”—tasks requiring human expertise and judgment—as an opportunity now solvable with modern LLM capabilities.
The company started in March 2023 and raised pre-seed and seed funding, with Initialized Capital as one of their seed investors. At the time of this interview, they were very early stage with a small team including a design lead who joined from another YC company.
Behind every fintech application lies an army of operations personnel handling manual workflows: customer onboarding, KYC (Know Your Customer) verification, risk checks on transactions, credit card underwriting, and compliance reviews. These tasks require following detailed Standard Operating Procedures (SOPs) that exist in Google Docs, Wikis, or Confluence pages—sometimes involving 10-30 procedural steps per workflow. Each step requires human judgment to interpret edge cases, verify documents, and make compliance decisions.
The scale challenge is significant: these manual reviews take hours per case, yet the volume fluctuates dramatically. Traditional automation couldn’t handle the judgment-intensive aspects, creating a bottleneck that limited growth and increased costs for fintech companies.
Parcha’s approach differs fundamentally from general-purpose AI assistants. Rather than building agents that generate their own plans (like a personal assistant that might plan a party), they leverage existing human-written SOPs as the plan source. This makes agents “much more controllable and steerable” because they execute on pre-defined procedures rather than inventing approaches.
The key insight is that companies already have documented procedures for training human operators. These same documents—describing step-by-step processes for onboarding customers, verifying identities, checking state registrations—become the foundation for agent behavior. By skipping the plan-generation step, they reduce the failure modes associated with open-ended agent planning.
The team discovered that single-agent architectures struggled with complex SOPs. When context windows grew to 2,000-3,000 tokens describing every possible scenario, agents became confused and prone to hallucination—described as “failing open for a high-risk compliance scenario” which is unacceptable.
Their solution was a hierarchical multi-agent system:
The supervisor doesn’t need to know it’s managing agents—from its perspective, it simply has access to tools that happen to be complex agents with their own tools. As AJ describes it: “before you know it it’s agents all the way down.”
This architecture enables scaling to arbitrary complexity and was described as something “at the bleeding edge”—when discussing with framework developers at LangChain and LlamaIndex, they hadn’t encountered others doing similar implementations.
The team chose Claude (Anthropic) over OpenAI’s models for several reasons:
They noted that GPT-3.5 was fast but not steerable, while GPT-4 was more steerable but had other tradeoffs. Claude provided the best balance for their agent workloads.
Like many AI startups, Parcha initially built on LangChain for rapid prototyping. The framework enabled quick demo development but introduced significant complexity for production:
The solution was aggressive simplification. Co-founder Miguel rewrote the entire agent codebase in a weekend, reducing it to approximately 500 lines of code. This enabled:
They retained LangChain for LLM interface interoperability (easily swapping GPT-4 for Claude) and tooling infrastructure, but the core agent logic became custom code.
Several prompt engineering insights emerged from their production experience:
Following techniques discussed by Andrej Karpathy, they set context at the beginning of prompts: “You are an expert at onboarding B2B customers on a fintech platform. You are really good at making compliance related decisions…” This statistically increases the likelihood of quality outputs—a counterintuitive finding given that we don’t preface interactions with smart humans by telling them how smart they are.
Early implementations used comma-separated lists for agent plans. They discovered JSON formatting with explicit step tracking performed significantly better, giving agents more context about their current position in the execution flow.
A strategic product decision: rather than Parcha achieving 99% accuracy internally, they aim for 90% accuracy then put editing tools in customers’ hands. Operations staff who previously trained human teams on procedures become prompt engineers for agents. This mirrors their existing workflow—creating documents, training humans, iterating on procedures—but with faster feedback loops.
The initial product is a Chrome extension showing every step the agent takes with full reasoning. This transparency builds customer trust before autonomous operation. Interestingly, customers immediately ask for API endpoints for autonomous batch processing—surprising the team with their willingness to embrace the technology.
Before production deployment, agents are tested against historical data. The target is 90% accuracy on back-testing before customer exposure. After that milestone, customers test in sandbox environments, find additional edge cases, and the team iterates.
Early customers receive intensive support—what AJ jokingly calls “agents handbuilt in California.” This hands-on approach is intentional for bleeding-edge technology: deep customer understanding, shadowing operators doing manual reviews, iterating rapidly on discovered issues.
The first demo showed an 8-step onboarding process with one or two integrations. Real customer SOPs proved far more complex with numerous edge cases and nuances. Bridging this gap required fundamental architectural changes (multi-agent systems) rather than incremental improvements.
Debugging agent behavior resembles whack-a-mole: solving one issue may surface another. The team frames this as learning—each fix adds to collective understanding of LLM manipulation. High iteration velocity matters more than deep academic background; the co-founders learned applied LLM engineering through experimentation rather than prior AI research experience.
In compliance contexts, hallucination equals “failing open”—the worst possible outcome. This drove architectural decisions (simplified agents, smaller context windows, specialized worker agents) and deployment practices (extensive testing, sandbox environments, human verification).
The transcript reveals interesting perspectives on building with LLMs:
Parcha envisions a progression from technical implementations by engineers to sales-led prompt engineering deployments, eventually reaching self-serve agent creation. The broader thesis is “hybrid workforces” where humans collaborate with “digital employees” handling operational drudgery, enabling thousand-fold more companies to exist with smaller teams serving larger customer bases.
The team acknowledges that general-purpose agents remain far harder than verticalized agents due to well-constrained problem spaces. Parcha’s focus on fintech compliance represents a strategic choice of tractable complexity where current LLM capabilities can deliver production value.
Lendi, an Australian FinTech company, developed Guardian, an agentic AI application to transform the home loan refinancing experience. The company identified that homeowners lacked visibility into their mortgage positions and faced cumbersome refinancing processes, while brokers spent excessive time on administrative tasks. Using Amazon Bedrock's foundation models, Lendi built a multi-agent system deployed on Amazon EKS that monitors loan competitiveness, tracks equity positions in real-time, and streamlines refinancing through conversational AI. The solution was developed in 16 weeks and has already settled millions in home loans with significantly reduced refinance cycle times, enabling customers to complete refinancing in as little as 10 minutes through the Rate Radar feature.
Digits, a company providing automated accounting services for startups and small businesses, implemented production-scale LLM agents to handle complex workflows including vendor hydration, client onboarding, and natural language queries about financial books. The company evolved from a simple 200-line agent implementation to a sophisticated production system incorporating LLM proxies, memory services, guardrails, observability tooling (Phoenix from Arize), and API-based tool integration using Kotlin and Golang backends. Their agents achieve a 96% acceptance rate on classification tasks with only 3% requiring human review, handling approximately 90% of requests asynchronously and 10% synchronously through a chat interface.
Iberdrola, a global utility company, implemented AI agents using Amazon Bedrock AgentCore to transform IT operations in ServiceNow by addressing bottlenecks in change request validation and incident management. The solution deployed three agentic architectures: a deterministic workflow for validating change requests in the draft phase, a multi-agent orchestration system for enriching incident tickets with contextual intelligence, and a conversational AI assistant for simplifying change model selection. The implementation leveraged LangGraph agents containerized and deployed through AgentCore Runtime, with specialized agents working in sequence or adaptively based on incident complexity, resulting in reduced processing times, accelerated ticket resolution, and improved data quality across departments.