Kitaru vs Hatchet: Python agent runtime vs orchestration platform

Hatchet is a developer platform and orchestration engine for AI agents, durable workflows, background tasks, and parallel workloads, with SDKs across Python, TypeScript, Go, and Ruby. Hatchet Cloud or self-hosted, with the full operational surface that comes with a real orchestration product - priority queues, concurrency strategies, rate limits, alerts, and OTEL export. If your durability problem is shaped like a polyglot task queue that also handles agents, Hatchet is a credible answer.

Kitaru is narrower by design. We built it for Python agents because that’s the workload most teams we talk to are trying to ship right now. Today, it’s increasingly obvious that every team running agents in production ends up writing the same glue layer on top of a general-purpose orchestrator. A durable llm() call with token lineage. Artifact graphs linked to executions. Replay with input overrides. Tag-routed deployment snapshots. Kitaru ships those in the box, underneath whatever harness you already picked. Auth, observability, and governance stay where they are.

Kitaru

Use Kitaru if you are

Running Python agents and want `kitaru.llm()`, `kitaru.wait()`, and artifact lineage as primitives, not glue code your platform team maintains forever
Keeping the agent harness, auth, entitlements, and observability stack you already picked, and adding only the runtime layer underneath them
Deploying across Kubernetes, AWS, GCP, or Azure (Vertex AI, SageMaker, AzureML) and want an opinionated stack abstraction where one config switches every flow's backend
Replaying a failed run from a specific checkpoint with input overrides, without paying for the LLM calls above it again

Alternative

Use Hatchet if you are

Running a polyglot estate (Python, TypeScript, Go, Ruby) that needs one orchestration platform across all of it
Building background tasks, durable workflows, parallel workloads, or queue replacement, not specifically Python agents
Leaning on the flow-control surface (priority queues, concurrency strategies, static and dynamic rate limits) as a first-class feature
Fine adopting Hatchet Cloud's hosted control plane with published Developer/Team/Scale/Enterprise tiers, or self-hosting the engine plus Postgres plus dashboard

Hatchet is the orchestration platform you adopt. Kitaru is the runtime layer you embed underneath the harness you already picked.

Runtime primitive vs orchestration platform

Defaults tell you what a tool is actually for. Hatchet’s defaults are tasks, workers, queues, durable workflows, durable tasks, events, and schedules. That’s a packaged orchestration platform across your stack. Kitaru’s defaults are @flow and @checkpoint on ordinary Python, with kitaru.llm(), kitaru.wait(), and kitaru.save() / kitaru.load() shipped underneath whatever harness your team already picked.

Hatchet · platform across the stack Engine, workers, queues, and dashboard packaged together

Hatchet platform

events · cron · webhooks durable engine · workers · queues concurrency · rate limits · priority dashboard · OTEL · alerts

your app re-shapes around the engine

Adopt the platform, get the runtime as one part of it.

Kitaru · runtime, one layer of four Owns the runtime, leaves harness and platform untouched

Platform your auth, entitlements, observability

Harness Pydantic AI, OpenAI, Claude, LangGraph, raw Python

Runtime Kitaru: @flow, @checkpoint, llm(), wait(), save/load

Model OpenAI, Anthropic, Google, open-weights

Kitaru owns one layer. The rest of your stack stays as you picked it.

Layered model: Kitaru owns the runtime layer and stays out of harness, auth, and platform. Hatchet packages an orchestration platform that includes the runtime, plus queues, workers, schedules, alerts, and dashboard. If you’ve already picked your harness, auth, and observability stack, only Kitaru fits underneath them without overlap.
Adoption shape: pip install kitaru, add two decorators to ordinary Python, the flow runs in your process. Hatchet workers register with the engine, get triggered by tasks or events, and operate alongside an API server, Postgres, optional RabbitMQ, and a dashboard. Both work; the question is whether the runtime sits inside your app or whether your app gets re-shaped around the runtime.
Operational footprint: A Kitaru server is one process plus your S3, GCS, or Azure Blob bucket and a metadata DB. A Hatchet self-host is API server, engine, Postgres, optional RabbitMQ, dashboard, and workers; the cloud control plane is hosted in US AWS. Both are reasonable. Kitaru is closer to a library, Hatchet is closer to a platform you operate.

LLM calls as first-class lineage

The LLM call is the unit of cost, latency, and failure in an agent. Hatchet has strong operational observability for tasks and workflows. There’s a dashboard, alerts, metrics, and OpenTelemetry export to Datadog or Grafana. What it doesn’t ship is an llm() primitive that resolves a model alias, injects the provider key from your configured secret backend, and logs prompt, response, latency, tokens, and resolved model against the enclosing checkpoint by default. That part is glue you write on top. Or simply use Kitaru for it.

Hatchet · OTEL spans across tasks LLM logging is glue inside your task body

@review_flow.task()

async def research(input, ctx):

# wrap OpenAI / Anthropic yourself

return await call_llm(...)

OTEL trace

task · research 4.2s

└─ http.post openai.com 4.1s

Generic spans. No prompt, response, tokens, or cost in the box.

Kitaru · LLM as a captured checkpoint event Prompt, response, tokens, latency, and cost on captured LLM calls when reported

@checkpoint

def research(topic):

return kitaru.llm(prompt=..., model="fast")

checkpoint.research · captured

prompt"Research: durable agents..."

response"A durable agent runtime..."

modelclaude-sonnet-4-6

tokensin 312 · out 488

latency4.1s

cost$0.012

Captured by default. Linked to the checkpoint, queryable per run.

API surface: kitaru.llm(prompt, model="fast") is the primitive. Resolved aliases (so the same code maps to whichever provider is configured in the stack), automatic key injection, response captured. In Hatchet you write your own call_llm() inside a task and decide what to log.
Per-call lineage: Prompt, response, token counts, latency, and resolved model land on the run record automatically and link to the enclosing checkpoint. Hatchet has OTEL spans across tasks; it doesn’t position a per-call LLM record linked to a checkpoint as a first-class concept in the docs we reviewed.
Replay reads, not re-bills: On replay, kitaru.llm() reads the captured response from the checkpoint instead of hitting the provider again, unless the input changed. Hatchet’s durable replay applies to step return values broadly; whether the LLM call re-hits the provider depends on how you shaped the step body.

Replay with checkpoint-output overrides

Both products replay from durable state. The harder question is whether you can change a checkpoint’s output and have downstream consumers re-execute against the new value. Kitaru exposes this as a first-class primitive: pin an override against checkpoint.research and the dependents replay against the override, everything else stays cached. Hatchet describes replay from the event log and retry, cancel, or replay from the dashboard, but not an equivalent input-override model where you swap a single step’s output and re-execute downstream against the change.

Hatchet · replay from the event log Restart the run; no input-override primitive in the docs

research re-run

draft re-run

wait re-run

Restart from the top of the run. Earlier steps execute again unless you've hand-rolled an override layer inside step bodies.

Operational replay. Not parameterised over a specific step's output.

Kitaru · checkpoint-output override Pin a value at any checkpoint, replay surgically

research cached

draft override

wait re-execute

kitaru replay --override checkpoint.draft="..."

Swap one checkpoint's output. Downstream re-executes against the new value, upstream stays cached.

Built-in. No event-log surgery, no per-step override scaffolding.

Mechanism: Kitaru exposes checkpoint selectors and override keys (checkpoint.research, checkpoint.draft, …) so you can replay a flow with a swapped value at any checkpoint and re-execute downstream consumers. Hatchet’s replay is operationally-scoped (replay this run, retry this step) rather than parameterized over a specific checkpoint’s output.
Use case: The agent wrote a bad brief at step 1, but step 7 took 4 minutes and 10k tokens to get there. With Kitaru you swap the brief and the flow re-executes against the new value while upstream artifacts stay cached where the inputs match. With Hatchet you’d reset to the start of the run or roll your own override layer inside your step bodies.
What gets cached: Kitaru caches typed artifacts per @checkpoint, indexed by the checkpoint’s name. Hatchet caches step return values in the durable event log, keyed by the step’s invocation in the run. The difference shows up most when you want to inject a hypothetical at a specific step and rerun downstream.

What makes Kitaru unique

Feature	Kitaru	Hatchet
Durable execution with checkpoint replay	Yes	Yes
Human/agent-in-the-loop waiting with compute released	Yes	Yes
Permissively-licensed self-hosting (Kitaru: Apache 2.0; Hatchet: MIT)	Yes	Yes
Python-agent-shaped primitives (`kitaru.llm()`, `kitaru.wait()`, `kitaru.save()`/`kitaru.load()`)	Yes	Not supported
Built-in LLM primitive with alias-resolved secrets and per-call token/latency logging	Yes	Not supported
Typed, versioned artifact lineage per checkpoint with cross-run diff	Yes	Not supported
Replay with checkpoint-output overrides (swap a step's output, re-execute downstream)	Yes	Not supported
Versioned, tag-routed deployment snapshots (default/canary/stable)	Yes	Not supported
Polyglot SDKs (Python, TypeScript, Go, Ruby)	Not supported	Yes
Documented concurrency strategies and static/dynamic rate limits	Not supported	Yes
Managed cloud with published tiers (Developer/Team/Scale/Enterprise)	Not supported	Yes

How the two surfaces map

Concept	Hatchet	Kitaru
Workflow boundary	`hatchet.workflow(name="...")` registered with the engine, run by workers	`@flow` on ordinary Python, called as `flow.run(...)`
Durable step	`@review_flow.task()` with optional `parents=[...]` for DAG dependencies	`@checkpoint` persists a typed, versioned artifact in your own bucket
Pause and resume	`@hatchet.durable_task` plus `await ctx.aio_wait_for_event("...")` to pause on an external event	`kitaru.wait(name="...", schema=...)` releases compute, resumes from any input source
LLM call	Your own `call_llm()` inside a task; provider key, prompt and token logging are glue you write	`kitaru.llm()` resolves the model alias, injects the provider key, logs prompt, tokens, and latency per call
Cross-run state	Bring your own store (Postgres, Redis, KV)	`kitaru.save(name, value)` / `kitaru.load(exec_id, name)`: typed artifacts in your own bucket, queryable across runs
Invocation	`review_flow.run(ReviewInput(topic="..."))` or `hatchet.event.push("...", {...})` to push an event	`flow.run(...)` for source/local execution; saved deployments via `kitaru invoke FLOW ...`, `KitaruClient().deployments.invoke(flow="...", inputs={...})`, or `flow.invoke(...)`
Deployment and versioning	Versioned through worker code; switch versions by deploying a new worker image	Immutable `flow.deploy()` snapshots, tag-routed (`default`, `canary`, `stable`)
Self-hosting	API server, engine, Postgres, optional RabbitMQ, dashboard, and workers	Single-service Kitaru server plus your S3, GCS, or Azure Blob and metadata DB

Code comparison

Kitaru Recommended

import kitaru
from kitaru import checkpoint, flow

@checkpoint
def research(topic: str) -> str:
  return kitaru.llm(
      prompt=f"Research: {topic}. Return a brief.",
      model="fast",
  )

@checkpoint
def draft(brief: str) -> str:
  return kitaru.llm(
      prompt=f"Write a draft from this brief:\n{brief}",
      model="fast",
  )

@flow
def review_flow(topic: str) -> str:
  brief = research(topic)
  text = draft(brief)
  approved = kitaru.wait(
      name="approve_draft",
      question="Approve draft?",
      schema=bool,
  )
  return text if approved else "Rejected"

review_flow.run(topic="Durable agents").wait()

Hatchet (Python SDK)

from hatchet_sdk import Hatchet, Context, DurableContext
from pydantic import BaseModel

hatchet = Hatchet()

class ReviewInput(BaseModel):
  topic: str

class TextOutput(BaseModel):
  text: str

async def call_llm(prompt: str) -> str:
  ...

review_flow = hatchet.workflow(name="ReviewFlow")

@review_flow.task()
async def research(input: ReviewInput, ctx: Context) -> TextOutput:
  return TextOutput(text=await call_llm(f"Research: {input.topic}"))

@review_flow.task(parents=[research])
async def draft(input: ReviewInput, ctx: Context) -> TextOutput:
  brief = ctx.task_output(research).text
  return TextOutput(text=await call_llm(f"Draft: {brief}"))

@review_flow.durable_task(parents=[draft])
async def approve(input: ReviewInput, ctx: DurableContext) -> TextOutput:
  text = ctx.task_output(draft).text
  decision = await ctx.aio_wait_for_event("review:approve")
  return TextOutput(text=text if decision.data.get("ok") else "Rejected")

def main() -> None:
  worker = hatchet.worker("review-worker", workflows=[review_flow])
  worker.start()

# Trigger:  review_flow.run(ReviewInput(topic="Durable agents"))
# Approve:  hatchet.event.push("review:approve", {"ok": True})
# Workers register with the Hatchet engine; events route through it.

A runtime, not a platform

If you want the full operational platform around durable workflows (hosted cloud, queues, rate limits, multi-language SDKs, alerts and dashboards), Hatchet is a strong pick, and I’d tell any team that.

For Python agent work specifically (a durable llm() call, artifact lineage linked to executions, replay with input overrides, versioned tag-routed deploys), the glue you’d write on top of a general-purpose orchestration platform is what Kitaru ships for you.

We’ve spent five years building the MLOps-ready version of this problem space at ZenML. JetBrains runs their AI globally on it; Adeo runs across all their brands and geographies on it. Kitaru is that team two years into the agent version. Bet on us for agent infrastructure and you’re betting on the group that’s been doing this the whole time.

uv init --bare && uv add kitaru && uv run kitaru init

Book a demo