Compare

Kitaru vs Hatchet: Python agent runtime vs orchestration platform

Hatchet is a developer platform for AI agents, durable workflows, background tasks, and parallel workloads, with multi-language SDKs and hosted cloud. Kitaru is the self-hosted Python runtime that slots underneath your existing agent harness, with llm(), wait(), save/load artifacts, and replay overrides in the box.

pip install kitaru
Book a demo Read the docs

Hatchet is a developer platform and orchestration engine for AI agents, durable workflows, background tasks, and parallel workloads, with SDKs across Python, TypeScript, Go, and Ruby. Hatchet Cloud or self-hosted, with the full operational surface that comes with a real orchestration product - priority queues, concurrency strategies, rate limits, alerts, and OTEL export. If your durability problem is shaped like a polyglot task queue that also handles agents, Hatchet is a credible answer.

Kitaru is narrower by design. We built it for Python agents because that’s the workload most teams we talk to are trying to ship right now. Today, it’s increasingly obvious that every team running agents in production ends up writing the same glue layer on top of a general-purpose orchestrator. A durable llm() call with token lineage. Artifact graphs linked to executions. Replay with input overrides. Tag-routed deployment snapshots. Kitaru ships those in the box, underneath whatever harness you already picked. Auth, observability, and governance stay where they are.

Kitaru

Use Kitaru if you are

  • Running Python agents and want `kitaru.llm()`, `kitaru.wait()`, and artifact lineage as primitives, not glue code your platform team maintains forever
  • Keeping the agent harness, auth, entitlements, and observability stack you already picked, and adding only the runtime layer underneath them
  • Deploying across Kubernetes, AWS, GCP, or Azure (Vertex AI, SageMaker, AzureML) and want an opinionated stack abstraction where one config switches every flow's backend
  • Replaying a failed run from a specific checkpoint with input overrides, without paying for the LLM calls above it again
Alternative

Use Hatchet if you are

  • Running a polyglot estate (Python, TypeScript, Go, Ruby) that needs one orchestration platform across all of it
  • Building background tasks, durable workflows, parallel workloads, or queue replacement, not specifically Python agents
  • Leaning on the flow-control surface (priority queues, concurrency strategies, static and dynamic rate limits) as a first-class feature
  • Fine adopting Hatchet Cloud's hosted control plane with published Developer/Team/Scale/Enterprise tiers, or self-hosting the engine plus Postgres plus dashboard
Hatchet is the orchestration platform you adopt. Kitaru is the runtime layer you embed underneath the harness you already picked.

Runtime primitive vs orchestration platform

Defaults tell you what a tool is actually for. Hatchet’s defaults are tasks, workers, queues, durable workflows, durable tasks, events, and schedules. That’s a packaged orchestration platform across your stack. Kitaru’s defaults are @flow and @checkpoint on ordinary Python, with kitaru.llm(), kitaru.wait(), and kitaru.save() / kitaru.load() shipped underneath whatever harness your team already picked.

Hatchet · platform across the stack Engine, workers, queues, and dashboard packaged together
Hatchet platform
events · cron · webhooks durable engine · workers · queues concurrency · rate limits · priority dashboard · OTEL · alerts
your app re-shapes around the engine
Adopt the platform, get the runtime as one part of it.
Kitaru · runtime, one layer of four Owns the runtime, leaves harness and platform untouched
Platform your auth, entitlements, observability
Harness Pydantic AI, OpenAI, Claude, LangGraph, raw Python
Runtime Kitaru: @flow, @checkpoint, llm(), wait(), save/load
Model OpenAI, Anthropic, Google, open-weights
Kitaru owns one layer. The rest of your stack stays as you picked it.
  • Layered model: Kitaru owns the runtime layer and stays out of harness, auth, and platform. Hatchet packages an orchestration platform that includes the runtime, plus queues, workers, schedules, alerts, and dashboard. If you’ve already picked your harness, auth, and observability stack, only Kitaru fits underneath them without overlap.
  • Adoption shape: pip install kitaru, add two decorators to ordinary Python, the flow runs in your process. Hatchet workers register with the engine, get triggered by tasks or events, and operate alongside an API server, Postgres, optional RabbitMQ, and a dashboard. Both work; the question is whether the runtime sits inside your app or whether your app gets re-shaped around the runtime.
  • Operational footprint: A Kitaru server is one process plus your S3, GCS, or Azure Blob bucket and a metadata DB. A Hatchet self-host is API server, engine, Postgres, optional RabbitMQ, dashboard, and workers; the cloud control plane is hosted in US AWS. Both are reasonable. Kitaru is closer to a library, Hatchet is closer to a platform you operate.

LLM calls as first-class lineage

The LLM call is the unit of cost, latency, and failure in an agent. Hatchet has strong operational observability for tasks and workflows. There’s a dashboard, alerts, metrics, and OpenTelemetry export to Datadog or Grafana. What it doesn’t ship is an llm() primitive that resolves a model alias, injects the provider key from your configured secret backend, and logs prompt, response, latency, tokens, and resolved model against the enclosing checkpoint by default. That part is glue you write on top. Or simply use Kitaru for it.

Hatchet · OTEL spans across tasks LLM logging is glue inside your task body
@review_flow.task()
async def research(input, ctx):
# wrap OpenAI / Anthropic yourself
return await call_llm(...)
OTEL trace
task · research 4.2s
└─ http.post openai.com 4.1s
Generic spans. No prompt, response, tokens, or cost in the box.
Kitaru · LLM as a captured checkpoint event Prompt, response, tokens, latency, and cost on every call
@checkpoint
def research(topic):
return kitaru.llm(prompt=..., model="fast")
checkpoint.research · captured
prompt"Research: durable agents..."
response"A durable agent runtime..."
modelclaude-sonnet-4-6
tokensin 312 · out 488
latency4.1s
cost$0.012
Captured by default. Linked to the checkpoint, queryable per run.
  • API surface: kitaru.llm(prompt, model="fast") is the primitive. Resolved aliases (so the same code maps to whichever provider is configured in the stack), automatic key injection, response captured. In Hatchet you write your own call_llm() inside a task and decide what to log.
  • Per-call lineage: Prompt, response, token counts, latency, and resolved model land on the run record automatically and link to the enclosing checkpoint. Hatchet has OTEL spans across tasks; it doesn’t position a per-call LLM record linked to a checkpoint as a first-class concept in the docs we reviewed.
  • Replay reads, not re-bills: On replay, kitaru.llm() reads the captured response from the checkpoint instead of hitting the provider again, unless the input changed. Hatchet’s durable replay applies to step return values broadly; whether the LLM call re-hits the provider depends on how you shaped the step body.

Replay with checkpoint-output overrides

Both products replay from durable state. The harder question is whether you can change a checkpoint’s output and have downstream consumers re-execute against the new value. Kitaru exposes this as a first-class primitive: pin an override against checkpoint.research and the dependents replay against the override, everything else stays cached. Hatchet describes replay from the event log and retry, cancel, or replay from the dashboard, but not an equivalent input-override model where you swap a single step’s output and re-execute downstream against the change.

Hatchet · replay from the event log Restart the run; no input-override primitive in the docs
research re-run
draft re-run
wait re-run
Restart from the top of the run. Earlier steps execute again unless you've hand-rolled an override layer inside step bodies.
Operational replay. Not parameterised over a specific step's output.
Kitaru · checkpoint-output override Pin a value at any checkpoint, replay surgically
research cached
draft override
wait re-execute
kitaru replay --override checkpoint.draft="..."
Swap one checkpoint's output. Downstream re-executes against the new value, upstream stays cached.
Built-in. No event-log surgery, no per-step override scaffolding.
  • Mechanism: Kitaru exposes checkpoint selectors and override keys (checkpoint.research, checkpoint.draft, …) so you can replay a flow with a swapped value at any checkpoint and re-execute downstream consumers. Hatchet’s replay is operationally-scoped (replay this run, retry this step) rather than parameterized over a specific checkpoint’s output.
  • Use case: The agent wrote a bad brief at step 1, but step 7 took 4 minutes and 10k tokens to get there. With Kitaru you swap the brief and the flow re-executes against the new value while upstream artifacts stay cached where the inputs match. With Hatchet you’d reset to the start of the run or roll your own override layer inside your step bodies.
  • What gets cached: Kitaru caches typed artifacts per @checkpoint, indexed by the checkpoint’s name. Hatchet caches step return values in the durable event log, keyed by the step’s invocation in the run. The difference shows up most when you want to inject a hypothetical at a specific step and rerun downstream.

What makes Kitaru unique

Feature Kitaru Hatchet
Durable execution with checkpoint replay Yes Yes
Human/agent-in-the-loop waiting with compute released Yes Yes
Permissively-licensed self-hosting (Kitaru: Apache 2.0; Hatchet: MIT) Yes Yes
Python-agent-shaped primitives (`kitaru.llm()`, `kitaru.wait()`, `kitaru.save()`/`kitaru.load()`) Yes Not supported
Built-in LLM primitive with alias-resolved secrets and per-call token/latency logging Yes Not supported
Typed, versioned artifact lineage per checkpoint with cross-run diff Yes Not supported
Replay with checkpoint-output overrides (swap a step's output, re-execute downstream) Yes Not supported
Versioned, tag-routed deployment snapshots (default/canary/stable) Yes Not supported
Polyglot SDKs (Python, TypeScript, Go, Ruby) Not supported Yes
Documented concurrency strategies and static/dynamic rate limits Not supported Yes
Managed cloud with published tiers (Developer/Team/Scale/Enterprise) Not supported Yes

How the two surfaces map

ConceptHatchetKitaru
Workflow boundaryhatchet.workflow(name="...") registered with the engine, run by workers@flow on ordinary Python, called as flow.run(...)
Durable step@review_flow.task() with optional parents=[...] for DAG dependencies@checkpoint persists a typed, versioned artifact in your own bucket
Pause and resume@hatchet.durable_task plus await ctx.aio_wait_for_event("...") to pause on an external eventkitaru.wait(name="...", schema=...) releases compute, resumes from any input source
LLM callYour own call_llm() inside a task; provider key, prompt and token logging are glue you writekitaru.llm() resolves the model alias, injects the provider key, logs prompt, tokens, and latency per call
Cross-run stateBring your own store (Postgres, Redis, KV)kitaru.save(name, value) / kitaru.load(exec_id, name): typed artifacts in your own bucket, queryable across runs
Invocationreview_flow.run(ReviewInput(topic="...")) or hatchet.event.push("...", {...}) to push an eventflow.run(...) for source/local execution; saved deployments via kitaru invoke FLOW ..., KitaruClient().deployments.invoke(flow="...", inputs={...}), or flow.invoke(...)
Deployment and versioningVersioned through worker code; switch versions by deploying a new worker imageImmutable flow.deploy() snapshots, tag-routed (default, canary, stable)
Self-hostingAPI server, engine, Postgres, optional RabbitMQ, dashboard, and workersSingle-service Kitaru server plus your S3, GCS, or Azure Blob and metadata DB

Code comparison

Kitaru Recommended
import kitaru
from kitaru import checkpoint, flow

@checkpoint
def research(topic: str) -> str:
  return kitaru.llm(
      prompt=f"Research: {topic}. Return a brief.",
      model="fast",
  )

@checkpoint
def draft(brief: str) -> str:
  return kitaru.llm(
      prompt=f"Write a draft from this brief:\n{brief}",
      model="fast",
  )

@flow
def review_flow(topic: str) -> str:
  brief = research(topic)
  text = draft(brief)
  approved = kitaru.wait(
      name="approve_draft",
      question="Approve draft?",
      schema=bool,
  )
  return text if approved else "Rejected"

review_flow.run(topic="Durable agents").wait()
Hatchet (Python SDK)
from hatchet_sdk import Hatchet, Context, DurableContext
from pydantic import BaseModel

hatchet = Hatchet()

class ReviewInput(BaseModel):
  topic: str

class TextOutput(BaseModel):
  text: str

async def call_llm(prompt: str) -> str:
  ...

review_flow = hatchet.workflow(name="ReviewFlow")

@review_flow.task()
async def research(input: ReviewInput, ctx: Context) -> TextOutput:
  return TextOutput(text=await call_llm(f"Research: {input.topic}"))

@review_flow.task(parents=[research])
async def draft(input: ReviewInput, ctx: Context) -> TextOutput:
  brief = ctx.task_output(research).text
  return TextOutput(text=await call_llm(f"Draft: {brief}"))

@review_flow.durable_task(parents=[draft])
async def approve(input: ReviewInput, ctx: DurableContext) -> TextOutput:
  text = ctx.task_output(draft).text
  decision = await ctx.aio_wait_for_event("review:approve")
  return TextOutput(text=text if decision.data.get("ok") else "Rejected")

def main() -> None:
  worker = hatchet.worker("review-worker", workflows=[review_flow])
  worker.start()

# Trigger:  review_flow.run(ReviewInput(topic="Durable agents"))
# Approve:  hatchet.event.push("review:approve", {"ok": True})
# Workers register with the Hatchet engine; events route through it.

A runtime, not a platform

If you want the full operational platform around durable workflows (hosted cloud, queues, rate limits, multi-language SDKs, alerts and dashboards), Hatchet is a strong pick, and I’d tell any team that.

For Python agent work specifically (a durable llm() call, artifact lineage linked to executions, replay with input overrides, versioned tag-routed deploys), the glue you’d write on top of a general-purpose orchestration platform is what Kitaru ships for you.

We’ve spent five years building the MLOps-ready version of this problem space at ZenML. JetBrains runs their AI globally on it; Adeo runs across all their brands and geographies on it. Kitaru is that team two years into the agent version. Bet on us for agent infrastructure and you’re betting on the group that’s been doing this the whole time.

pip install kitaru
Book a demo