Kitaru vs OpenAI Agents SDK: build the agent, run it durably

The OpenAI Agents SDK is the code-first harness for defining and running an agent. It’s OpenAI’s opinionated stack for building agent behavior in Python and TypeScript.

Kitaru is not another harness. It lives one layer down as the durable runtime. Wrap your existing OpenAI Agents SDK code with @flow and KitaruRunner and the same agent gets checkpointing, replay, durable waits, artifact lineage, and a stack abstraction for Kubernetes, AWS, GCP, and Azure. The two are complementary, with a first-class OpenAI Agents adapter that does the wiring.

Kitaru

Use Kitaru if you are

Running an OpenAI agent long enough that crashes, pod evictions, or timeouts re-burn LLM tokens you already paid for
Pausing for hours or days on a human approval without keeping a worker process alive
Replaying a failed run from a specific step without re-paying for the LLM calls above it
Deploying onto your own Kubernetes, AWS, GCP, or Azure with execution state in your own object storage
Standardizing one durable runtime under multiple harnesses (OpenAI Agents, PydanticAI, raw Python)

OpenAI Agents SDK

Use the OpenAI Agents SDK if you are

Building an Agents SDK-native agent and want a tight code-first harness for tools, handoffs, guardrails, structured outputs, and OpenAI platform integrations
Standardizing on OpenAI's tracing, evals, sandbox agents, and hosted tools in one ecosystem
Running short-lived agents where crashes and long approvals are not a real cost yet

The OpenAI Agents SDK builds the agent and Kitaru helps that agent survive production.

Replay from the last good checkpoint, not from step 1

The OpenAI Agents SDK gives you the agent loop. But have you ever thought, what happens when that loop crashes at step 15 of an 18-step run? The SDK has run state and continuation surfaces around approval interruptions, but the durable-replay-after-failure story is not its core promise. Kitaru’s is.

OpenAI Agents SDK alone

Crash mid-run, restart from step 1

1 research $0.04

2 draft $0.09

3 review crash

restart from step 1

re-pays $0.13 in tokens already spent

Kitaru-wrapped

Replay from the failed checkpoint

1 @checkpoint research cached

2 @checkpoint draft cached

3 @checkpoint review re-run

$ kitaru executions replay <exec_id> --from review

only the failing step re-bills

Checkpoint boundary: Every @checkpoint persists its return value as a typed artifact. Wrap a Runner call in one and the whole agent run becomes a replay boundary.
Replay from a step: kitaru executions replay <exec_id> --from <checkpoint> replays from the selected checkpoint and its downstream descendants. Checkpoints that are neither replay roots nor downstream dependencies are skipped, so completed expensive work can be reused instead of re-executed.
Per-call granularity: checkpoint_strategy="calls" on KitaruRunner records each model or tool call as its own peer checkpoint, so a single failing tool does not invalidate the run.

Long waits without keeping a worker alive

The OpenAI Agents SDK has a real approval story. Tools can require approval; the run returns interrupted, and you resume it later with a decision. What the base SDK does not provide by itself is a self-hosted workflow runtime that releases compute, persists workflow-level state and artifacts, and resumes the long-running workflow across your own infrastructure. This is what Kitaru solves:

OpenAI Agents SDK approval Worker stays alive while it waits

tool approval pause

process alive · 6h idle

human approves

Compute 100%

kitaru.wait() Compute released, flow resumes on signal

wait(name="approve")

no container · 6h waiting

signal arrives, flow resumes

Compute 0%

Compute released: kitaru.wait() suspends the flow and tears down the worker. When the approval signal arrives, a fresh container reads the checkpoint state and the flow resumes. Idle compute does not pay rent.
Wait for anything: Human, webhook, another agent, scheduled time, or an out-of-band CLI invocation can satisfy the wait.
OpenAI approvals, bridged: The adapter ships wait_for_approval(result, ...) so a native OpenAI Agents SDK approval interruption becomes a durable kitaru.wait() without you wiring it together.

Artifacts you can inspect, not just traces of what happened

OpenAI’s tracing is genuinely useful. Model calls, tool calls, handoffs, guardrails, and custom spans all land in the dashboard, and the eval tooling on top is first-party. What traces are not is a typed artifact graph you can load, diff, and replay against. Kitaru ships that.

OpenAI trace What happened

model.request 218ms

tool.search_docs 412ms

model.request 186ms

handoff 9ms

model.request 324ms

Read-only. Reproduce by re-running.

Kitaru execution What was produced

exec 7c1b 3 checkpoints · $0.15

@checkpoint research brief.json

@checkpoint draft draft.md

@checkpoint review review.json

kitaru.load() replay diff

Per-checkpoint artifacts: Every @checkpoint return value lands in the active stack’s artifact store (local filesystem, S3, GCS, or Azure Blob). kitaru.llm() calls add prompt and response artifacts on top.
Cross-run load: From inside a checkpoint, kitaru.load(exec_id, name) reads a named artifact from another execution. Useful when last week’s research output should feed this week’s draft without re-running the research.
Inspect and replay: Executions, checkpoints, logs, and artifacts are exposed through the dashboard, CLI, Python client, and an MCP server. Trace tells you what happened; the execution record lets you do something about it.

What makes Kitaru unique

Feature	Kitaru	OpenAI Agents SDK
First-class Agent abstraction with instructions, tools, handoffs, guardrails	Not supported	Yes
First-party evals with graders, datasets, and eval runs	Not supported	Yes
Python and TypeScript SDKs	Not supported	Yes
Crash recovery via checkpoint replay that skips completed work	Yes	Partial Partial support
Durable wait/resume with compute released during the pause	Yes	Partial Partial support
Typed, versioned artifact lineage per checkpoint (cross-run load and diff)	Yes	Not supported
Framework-agnostic runtime (wrap OpenAI Agents, PydanticAI, Anthropic, raw Python)	Yes	Not supported
Self-hosted runtime with stack abstraction (Kubernetes, AWS, GCP, Azure)	Yes	Not supported
Versioned, tag-routed deployments with rollback	Yes	Not supported

How the two surfaces map

Concept	OpenAI Agents SDK	Kitaru
Layer	Agent harness (how the agent thinks and acts)	Durable runtime (how the agent survives over time and infra)
Agent definition	`Agent(name, instructions, tools, model, handoffs)`	Wraps existing agent code with `@flow` and `@checkpoint`
Run boundary	`Runner.run_sync(agent, input)`	`@flow` plus `KitaruRunner(agent, checkpoint_strategy=...)`
Crash recovery	Run state for approval interruptions; rerun for hard failures	Replay from a checkpoint, completed steps return cached output
Long wait on a human or agent	Tool approval that pauses the run	`kitaru.wait()` with the worker torn down, plus `wait_for_approval` adapter helper
Artifacts	Run history and final output	Typed, versioned artifact per checkpoint, cross-run `load()`
Observability	First-party tracing, evals, dashboards	Execution record with logs, checkpoints, artifacts, replay, MCP server
Deployment	Wherever you run your application or server	Stack abstraction for Kubernetes, AWS, GCP, Azure with versioned snapshots

Code comparison

OpenAI Agents SDK + Kitaru Recommended

from agents import Agent
from kitaru import flow
from kitaru.adapters.openai_agents import (
  KitaruRunner,
  OpenAIRunRequest,
  wait_for_approval,
)

reviewer = Agent(
  name="reviewer",
  instructions="Review the draft for compliance.",
  model="gpt-5",
)
runner = KitaruRunner(reviewer, checkpoint_strategy="runner_call")

@flow
def review_flow(case: str) -> str:
  # One Runner call == one durable checkpoint.
  # On replay, the cached final_output is returned, no re-billing.
  result = runner.run_sync(OpenAIRunRequest.start(case))

  # Native OpenAI approval bridges to a durable kitaru.wait().
  # Compute is released for the duration of the wait.
  if result.status == "interrupted":
      resume = wait_for_approval(
          result, name="approve_tool", timeout=600
      )
      result = runner.run_sync(resume)

  return str(result.final_output)

review_flow.run("Case C-001")

OpenAI Agents SDK alone

from agents import Agent, Runner

reviewer = Agent(
  name="reviewer",
  instructions="Review the draft for compliance.",
  model="gpt-5",
)

def review_flow(case: str) -> str:
  result = Runner.run_sync(reviewer, case)
  draft = str(result.final_output)
  # Blocking input(). If the container dies mid-wait,
  # the draft is lost and the run restarts from step 1.
  approved = input(f"Approve?\n{draft}\n[y/n]: ") == "y"
  return draft if approved else "Rejected"

review_flow("Case C-001")

Put a runtime under your OpenAI agents

If your OpenAI agent still fits in a notebook or a short-lived script, the SDK on its own is the right answer. If it is becoming a production workload, long-running, crash-surviving, approved by a human hours later, deployed on your own cloud, Kitaru is the runtime layer underneath the harness you already picked.

uv init --bare && uv add kitaru && uv run kitaru init

Book a demo