Agent loops
agent_turn
Use agent_turn(prompt, opts?) for the common "make one agent complete this
request" case. It wraps agent_loop, puts opts.system into the system prompt
alongside generic progress guidance, defaults to loop-until-done completion, and
requires a completion judge. Native-tool turns complete naturally when the model
returns final text with no tool calls; text/no-tool turns use the normal
sentinel path. Pass judge: {...} or done_judge: {...} to customize that
judge; omit it to use the default judge.
The return value is the normal agent_loop result with two extra summaries:
iterations ([{iteration, started, ended?, tool_count?, prose_chars?}]) and judge_decisions
([{iteration, verdict, reasoning, next_step, judge_duration_ms, trigger?}]).
let result = agent_turn("Summarize the current project risks.", {
system: "Be concise and cite concrete evidence.",
provider: "openai",
model: "gpt-5-mini",
})
log(result.visible_text)
log(result.judge_decisions[0].verdict)
agent_loop
Run an agent that keeps working until it's done. The agent maintains
conversation history across turns. Native-tool loops stop naturally when the
model returns final assistant text with no tool calls; tagged text-tool loops
use the completion sentinel <done>##DONE##</done>, and no-tool sentinel loops
use bare ##DONE##. Returns a dict with canonical visible text, tool usage,
transcript state, and any deferred queued human messages.
let result = agent_loop(
"Write a function that sorts a list, then write tests for it.",
"You are a senior engineer.",
{loop_until_done: true}
)
log(result.text) // the accumulated output
log(result.status) // "done", "stuck", "budget_exhausted", "idle", "watchdog", or "failed"
log(result.llm.iterations) // number of LLM round-trips
How it works
- Sends the prompt to the model
- Reads the response
- If
loop_until_done: true:- In native-tool mode, treats final text with no tool calls as completion
- In text-tool or no-tool sentinel mode, checks for the completion sentinel
(
<done>##DONE##</done>or bare##DONE##) - If completion is detected, stops and returns the accumulated output
- If no completion is detected, sends a nudge message asking the agent to continue
- Repeats until done or limits are hit
- If
loop_until_done: false(default): returns after the first response
agent_loop return value
agent_loop returns a namespaced dict. Execution metrics live under
llm, tool invocation data under tools. This shape replaces the
earlier flat layout (iterations, duration_ms, tools_used,
successful_tools, rejected_tools, tool_calling_mode were all
top-level keys before v0.8).
| Field | Type | Description |
|---|---|---|
status | string | Terminal state: "done" (natural completion), "suspended" (worker yielded at a cooperative suspend checkpoint), "stuck" (exceeded max_nudges consecutive text-only turns), "budget_exhausted" (hit max_iterations without any explicit break), "verify_capped" (a structured completion-judge veto cap was reached; see stop_reason: "completion_judge_cap_reached" or "done_judge_cap_reached"), "provider_error" (provider/tool-protocol request failed and was captured in error), "idle" (daemon yielded with no remaining wake source), "watchdog" (daemon idle-wait tripped the idle_watchdog_attempts limit), or "failed" (require_successful_tools not satisfied). |
error | dict or nil | Structured terminal failure for provider/tool-protocol failures: {category, reason, kind, provider, model, message, phase, tool_format, after_tool_result}. after_tool_result is true when the rejected model request included prior tool observations. |
text | string | Accumulated text output from all iterations |
visible_text | string | Human-visible accumulated output |
llm | dict | LLM execution metrics — see below |
tools | dict | Tool invocation summary — see below |
deferred_user_messages | list | Queued human messages deferred until agent yield/completion |
daemon_state | string | Final daemon lifecycle state; mirrors status for daemon loops. |
daemon_snapshot_path | string or nil | Persisted snapshot path when daemon persistence is enabled |
task_ledger | dict | Final task-ledger state (deliverables, nudges, etc.) |
trace | dict | Structured span/event summary for observability |
transcript | dict | Transcript of the full conversation state |
handle | dict | Present for status: "suspended"; resumable worker handle returned by resume_agent(...) |
reason | string | Present for status: "suspended"; suspend reason visible to the resumed turn as a system_reminder |
initiator | string | Present for status: "suspended"; one of "self", "parent", "operator", or "triggered" |
conditions | dict or nil | Present for status: "suspended"; optional resume trigger conditions |
iterations_completed | int | Present for status: "suspended"; completed LLM turns before the checkpoint yielded |
repeated_tool_calls | int | Present when stall_diagnostics is enabled. Counts adjacent repeated tool calls with identical name and arguments after the first call in each streak |
stall_warnings | list | Present when stall_diagnostics is enabled. Diagnostic warning records emitted when a repeat streak reaches the configured threshold |
suspected_loop | bool | Present when stall_diagnostics is enabled. true when at least one stall warning fired |
completion_judge | dict | Present when verify_completion_judge is configured. {invocations, vetoes, max_invocations, cap_reached} — the per-session judge call/veto counts, the resolved cap (nil when disabled), and whether the cap was hit. Lets a harness report judge churn without transcript mining |
done_judge | dict | Present when done_judge is configured. {invocations, vetoes, max_invocations, cap_reached} — the per-session done-judge call/veto counts, the resolved top-level cap (nil when disabled or not configured), and whether that cap was hit. This is separate from done_judge.cadence.max_invocations, which only gates when the judge is due |
Nested llm fields:
| Field | Type | Description |
|---|---|---|
iterations | int | Number of LLM round-trips |
duration_ms | int | Total wall-clock time in milliseconds |
input_tokens | int | Sum of input tokens across LLM calls |
output_tokens | int | Sum of output tokens across LLM calls |
Nested tools fields:
| Field | Type | Description |
|---|---|---|
calls | list | Names of tools that were attempted |
successful | list | Tools that returned status: "ok" at least once |
rejected | list | Tools rejected by approval policy, capability ceiling, handler error, or failed dispatch |
mode | string | Tool-calling contract used for the loop ("native", "text", …) |
Every dispatched tool attempt is injected into the next model turn as a tool result observation. Failed Harn-side handlers and blocked host-tool calls carry their error text in that observation, so the model can recover from prior failed attempts instead of inferring from an empty result.
Simulated users for eval harnesses
Use std/agent/user when a harness needs another model, or a deterministic
fixture, to stand in for the human user. The module returns an answerer object
that can be wired into an agent as an ask_user tool, or as a post-turn callback
for agents that ask clarification questions in plain text.
import {
agentic_user,
simulated_user_read_tools,
user_tools,
} from "std/agent/user"
let answerer = agentic_user(
"Provide a simple prompt to create ./index.test.ts with full edge coverage.",
"Research the codebase only if needed. Answer clarification questions with plausible user preferences. If the agent is done, stop.",
simulated_user_read_tools(),
"ollama:devstral-small-2",
{max_replies: 4, max_llm_calls: 8, max_iterations: 4},
)
let result = agent_loop(task, system, {
provider: "openai",
model: "gpt-5-mini",
tools: user_tools(answerer, coding_tools),
tool_format: "native",
loop_until_done: true,
max_iterations: 20,
})
For deterministic eval fixtures, use scripted_user(...) or its alias
fixture_user(...). Script entries can be strings or dicts with match,
reply, action: "stop", or action: "fail".
import { scripted_user, user_tools } from "std/agent/user"
let answerer = scripted_user([
{match: "*test runner*", reply: "Use Vitest and cover empty, invalid, and boundary inputs."},
{match: "*done*", action: "stop", reason: "complete"},
], {max_replies: 2})
agent_loop(task, system, {
tools: user_tools(answerer),
tool_format: "native",
loop_until_done: true,
})
When the target agent does not have an explicit user-question tool, use
simulated_user_post_turn(answerer) as post_turn_callback. It watches for
plain-text clarification questions, injects a simulated reply, and stops the
loop when the answerer chooses silence.
Both agentic_user and scripted_user enforce local guardrails. max_replies
limits how many user messages can be produced, max_llm_calls caps the nested
model calls used by an agentic user, and inner max_iterations / max_nudges
bound any codebase-research loop. Simulated-user decisions also emit
tool_call_audit events with audit.event_type set to
simulated_user_reply, simulated_user_stop, simulated_user_failed, or
simulated_user_budget_exhausted so evals can audit when the harness user
intervened.
Interactive chat loops
Use std/agent/chat when a harness wants an operator-typed chat loop instead
of hand-driving agent_loop one turn at a time. agent_chat_loop(...) owns the
session, calls on_user_input(state) before each model turn, preserves the
same session_id across turns, and closes the session with a typed reason
unless close_session: false is set.
import { agent_chat_loop, agent_chat_route_input } from "std/agent/chat"
import { read_line } from "std/io"
let result = agent_chat_loop({
session_id: "review-chat",
provider: "ollama",
model: "devstral-small-2",
tools: coding_tools,
tool_format: "native",
on_user_input: { state ->
let line = read_line()
if !line.ok {
return {kind: "exit", reason: line.status ?? "closed"}
}
return agent_chat_route_input(line.value, state, {
"/runs": { req -> {kind: "handled", message: render_recent_runs(req.state)} },
})
},
on_model_turn: { turn, state -> {state: state + {last_text: turn.visible_text}} },
})
The chat loop adds a Harn-handled wait_for_user tool by default. When the
model calls it, the current agent_loop turn stops with
stop_reason: "wait_for_user" and the wrapper returns to on_user_input.
Pass wait_for_user_tool: false to keep the tool registry unchanged.
agent_loop options
Same as llm_call, plus additional options:
| Key | Type | Default | Description |
|---|---|---|---|
profile | string | "tool_using" | Named preset for common loop shapes. One of "tool_using", "researcher", "verifier", or "completer"; explicit option keys override profile defaults |
loop_until_done | bool | false | Keep looping until completion. Native-tool loops complete on final text with no tool calls; text-tool/no-tool sentinel loops complete on ##DONE## or <done>##DONE##</done> |
done_sentinel | string|nil | mode-aware | Completion sentinel for sentinel-based loops. Use a non-empty string such as "##DONE##" to require sentinel completion, or nil for no sentinel. Native-tool loop-until-done loops default to nil; text/no-tool loop-until-done loops default to "##DONE##" |
max_iterations | int | 50 | Maximum number of LLM round-trips. Equivalent to iteration_budget: {mode: "fixed", initial: N, max: N} |
iteration_budget | string|dict | nil | Adaptive or fixed iteration cap. Pass a dict {mode, initial, max, extend_by} or the string "adaptive" / "fixed". See Adaptive iteration budget |
loop_control | closure | nil | Per-iteration policy callback state -> command. Receives a normalized loop-state snapshot and returns a command (extend/stop/none). See Adaptive iteration budget |
max_nudges | int | 8 | Max consecutive text-only responses before stopping |
nudge | string | see below | Custom message to send when nudging the agent |
llm_caller | closure | nil | Custom caller wrapping the per-turn llm_call. Preferred resilience surface. See Composable callers and middleware. |
llm_retries | int | 2 | (deprecated; prefer llm_caller with with_retry from std/llm/handlers) Retries on transient HTTP / provider errors. Off-by-one: llm_retries: K ≈ with_retry(..., {max_attempts: K + 1}). |
llm_backoff_ms | int | 2000 | (deprecated; prefer with_retry) Base exponential backoff in ms between LLM retries |
reasoning_policy / thinking_policy | string/bool | "auto" | Provider-aware reasoning policy. auto chooses a task/scale-appropriate setting; off disables thinking where possible and otherwise uses the provider's lowest reasoning floor; minimal, low, medium, high, and xhigh request explicit levels. Caller-supplied thinking or reasoning_effort always wins |
reasoning_scale / problem_scale | string | "medium" | Scale hint for reasoning_policy: "auto": small, medium, or large |
reasoning_task | string | inferred | Task hint for reasoning_policy: "auto": chat, agent, code, verify, or summarize |
tool_retries | int | 0 | Number of retry attempts for failed tool calls |
tool_backoff_ms | int | 1000 | Base backoff delay in ms for tool retries (doubles each attempt) |
max_concurrent_tools | int | 1 | Maximum in-flight tool calls from one planner turn. Results are recorded in emitted order even when calls complete out of order |
prefetch_next_turn | bool | false | Start the next planner turn after tool results are recorded while local/custom audit receipt sinks flush in the background. The loop drains those flushes before returning |
tool_surface_narrowing | bool/dict | {enabled: true, window_turns: 5, mode: "safe"} | Between turns, remove model-visible tools that were unused across the rolling window. Safe mode narrows unused read_only tools while keeping mutating/control/unknown tools by class; dict configs may also set mode: "aggressive", hard_keep, prune_classes, keep_classes, and unknown_tool_policy |
progress_tool | bool/dict | false | Opt in to a model-facing progress tool that emits progress_reported agent events. true exposes agent_progress; a dict may set name, description, and system_prompt_nudge. ACP clients receive task-list entries as canonical plan updates and message-only reports as Harn progress narration |
policy | dict | nil | Capability ceiling applied to this agent loop |
daemon | bool | false | Idle instead of terminating after text-only turns |
persist_path | string | nil | Persist daemon snapshots to this path on idle/finalize |
resume_path | string | nil | Restore daemon state from a previously persisted snapshot |
wake_interval_ms | int | nil | Fixed timer wake interval for daemon loops |
watch_paths | list/string | nil | Files to poll for mtime changes while idle |
consolidate_on_idle | bool | false | Run transcript auto-compaction before persisting an idle daemon snapshot |
compaction | string/dict/bool | {strategy: "hybrid", keep_last_n: 10} | Agent-loop context-window policy. Use "none" or false to disable; "truncate", "summarize_middle", "summarize_all", or "hybrid" to choose policy. Dict policies may include policy / compaction_policy with compaction instructions |
compact_threshold | int | model-aware | Estimated input-token threshold for compaction. Harn lowers this from the provider/model context window when known |
compact_keep_first | int | 0 | Prompt-visible messages to keep verbatim before the compaction summary. The system prompt is always kept separately |
compact_keep_last | int | strategy default | Prompt-visible messages to keep verbatim after the compaction summary |
auto_compact | bool/dict | nil | Auto-compaction options. Dict values may include the same compaction policy fields as compaction |
transcript_projection | string/dict | nil | Per-turn model-visible projection over the immutable raw transcript. Policies include clean_tool_repair, squash_failed_calls, summary_prefix, reachability_gc, and custom |
scratchpad | bool/dict | false | Session-local working memory. true initializes a compact {goals, open_items, facts, refs} scratchpad, recites it at the prompt tail each turn, and periodically reorganizes it. Dict configs may set enabled, recite, reorganize_every, max_recent_messages, schema_retries, initial, and reorganizer |
idle_watchdog_attempts | int | nil (disabled) | Max consecutive idle-wait ticks that may return no wake reason before the daemon terminates with status = "watchdog". Guards against a misconfigured daemon (e.g. bridge never signals, no timer, no watch paths) hanging the session silently |
context_callback | closure | nil | Per-turn hook that can rewrite prompt-visible messages and/or the effective system prompt before the next LLM call |
context_filter | closure | nil | Alias for context_callback |
timestamp_messages | bool | false | Decorate prompt-visible transcript messages with the current harness timestamp before each LLM call without mutating the stored transcript |
message_decorator | closure | nil | Per-message hook called as message_decorator(message, context) before each LLM call. The context includes session_id, iteration, index, and timestamp |
prompts / prompt_overrides | dict | nil | Override validated logical agent prompt ids such as agent.loop_contract, agent.tool_contract_text, and agent.completion_judge_system with a prompt asset path, {text}, {path}, or render closure. Unknown ids are rejected. For typo-resistant authoring, pass the typed override shape through agent_prompt_overrides(...) from std/agent/prompts |
post_turn_callback | closure | nil | Hook called after each turn. Receives turn metadata and may inject a message, request an immediate stage stop, or merge next-turn options such as llm_options: {tool_choice: "none"} |
verify_completion | closure | nil | Hook called when the loop is about to stop naturally. Return nil/true to accept the stop or feedback text to veto and continue |
verify_completion_judge | bool/dict | nil | Built-in structured judge for any natural stop. true uses defaults; a dict may set provider, model, system, feedback_fallback, and max_invocations (alias max_feedback, default 5) to cap repeated vetoes. Once the cap is hit the judge stops firing and the loop ends with status verify_capped and stop_reason completion_judge_cap_reached; set max_invocations: 0 to disable the cap |
done_judge | bool/dict | nil | Completion structured judge. It runs when the model naturally completes a native-tool loop or emits the done sentinel, and may veto by returning verdict: "continue" plus next_step or reasoning. Dict configs may include max_invocations (alias max_feedback) to terminally cap repeated vetoes, plus cadence: {every?, when?, max_invocations?, min_iterations_before_first?} to control when the judge is due |
step_judge | dict | nil | Per-turn structured judge that runs after an assistant turn and before tool dispatch. Dict configs may include provider, model, on_veto ("replace" or "retain"), max_attempts, skip_when_empty, skip_when_stalled, and skip_when_iterations_remaining (default 1, skips when no regeneration turn remains). Skips emit step_judge_decision with skipped: true |
llm_transcript_dir | string | nil | Per-loop directory for Harn's existing llm_transcript.jsonl sidecar. This is equivalent to scoping HARN_LLM_TRANSCRIPT_DIR to one agent loop and is preferred when a script needs run-specific auditable model-turn JSONL |
turn_policy | dict | nil | Turn-shape policy for action stages. Supports require_action_or_yield: bool, allow_done_sentinel: bool (default true; set to false in workflow-owned action stages so nudges stop advertising the done sentinel), and max_prose_chars: int |
native_tool_fallback | string | "allow" | Native-tool-stage policy when the provider emits text-mode <tool_call> content instead of native tool calls. "allow" preserves the current recovery path, "allow_once" accepts the first fallback turn then rejects later repeats with corrective feedback, and "reject" fails closed on the first text fallback |
stop_after_successful_tools | list<string> | nil | Stop after a tool-calling turn whose successful results include one of these tool names. Useful for workflow-owned verify loops such as ["edit", "scaffold"] |
require_successful_tools | list<string|list<string>> | nil | Mark a cleanly completed loop status = "failed" unless every required tool succeeds at least once. A nested list is an OR group, e.g. ["run_command", ["read_command_output", "read_command_output_tail"]]. Keeps action stages honest when attempted effects were rejected, errored, or skipped |
stall_diagnostics | bool/dict | nil | Detect adjacent repeated tool calls with the same name and identical arguments. true enables conservative defaults (threshold: 3, one feedback nudge, argument digests only). Dict options: enabled, threshold, inject_feedback, max_feedback, exempt_tools/allow_repeated_tools, and include_arguments |
skills | skill_registry or list | nil | Skill registry exposed to the match-and-activate lifecycle phase. See Skills lifecycle |
skill_match | dict | {strategy: "metadata", top_n: 1, sticky: true} | Match configuration — strategy ("metadata" | "host" | "embedding"), top_n, sticky |
working_files | list|string | [] | Paths that feed paths: glob auto-trigger in the metadata matcher and ride along as a hint to host-delegated matchers |
mcp_servers | list | nil | MCP servers to connect for this loop. Harn calls tools/list once per server, adds discovered tools as <server>__<tool>, and dispatches matching tool calls through tools/call |
agent_loop forwards thinking, reasoning_effort, interleaved_thinking,
and anthropic_beta_features to every model turn. When neither thinking nor
reasoning_effort is set, reasoning_policy: "auto" lowers provider quirks
into explicit typed thinking options before the call reaches the provider. For
example, OpenAI reasoning models get thinking: {mode: "effort"} (off
becomes none on newer GPT-5 routes that advertise it, otherwise minimal),
Gemini 2.5 gets native generationConfig.thinkingConfig, Together hybrid
models get reasoning.enabled, and Qwen-style local providers can use
thinking: {mode: "disabled"} to trigger Harn's /no_think injection.
For local Qwen routes (ollama, llamacpp, local, and mlx), auto
keeps small/medium tasks at this disabled floor because those chat templates
are more reliable on compact edit loops without forced thinking.
For Claude Opus 4.6/4.7 agent loops, thinking: true remains the
single explicit switch that enables extended thinking and the Anthropic
interleaved-thinking beta header.
ACP clients can pin the same abstraction for a session with
session/set_config_option(configId="thought_level"). Agent loops running in
that session inherit the pin unless their options explicitly set
reasoning_policy, thinking_policy, thinking, or reasoning_effort.
Profiles preload the common loop-budget and retry keys below. Pass any key explicitly to override the profile's value for that call.
Agent scratchpad
agent_loop(..., {scratchpad: true}) creates a small structured scratchpad for
the session, renders it as a tail system fragment on every turn, and runs a
structured reorganization pass every three continuing turns. The reorganizer may
use a different provider or model:
agent_loop(task, system, {
loop_until_done: true,
scratchpad: {
reorganize_every: 2,
reorganizer: {provider: "ollama", model: "devstral-small-2"},
},
})
The scratchpad is capped at 16 KiB and is stored as live session state, not as a
synthetic replay message. Updates append compact agent_scratchpad transcript
events with action/version/count metadata; session snapshots and final
transcripts expose scratchpad, scratchpad_version, and
metadata.agent_scratchpad.
When paired with transcript_projection: {policy: "reachability_gc"}, the loop
automatically supplies the current scratchpad as a GC root and
scratchpad-version write barrier for each provider turn. Referenced tool output
stays visible; stale, unreferenced tool-result bodies can be reclaimed from the
model-visible prefix while the raw transcript remains intact.
Scripts can read and write the state directly with
agent_session_scratchpad(id), agent_session_set_scratchpad(id, pad, opts?),
and agent_session_clear_scratchpad(id, opts?). Reorganization validates that
returned facts cite source refs already present in the scratchpad or recent
turns, so heavy tool output remains referenced rather than copied.
The deterministic regression/eval harness for the none vs append-only vs periodic-reorg comparison is:
cargo run --quiet --bin harn -- run examples/evals/agent_scratchpad_retention.harn
Resilience knobs
The preferred surface for retry / fallback / shadow / budget / cache /
circuit-breaker behavior on agent_loop is llm_caller:. Pass a
closure that wraps the per-turn llm_call(...) and the loop will
route every turn through it:
import {default_llm_caller, with_retry, with_fallback, compose} from "std/llm/handlers"
let caller = compose([
with_retry({max_attempts: 4, backoff: "exponential"}),
])(default_llm_caller())
let result = agent_loop(task, system, {
loop_until_done: true,
llm_caller: caller,
})
Caller contract: fn(call) -> {ok, value | status, error?} where
call = {prompt, system, opts, turn: {iteration, session_id, attempt}}.
The legacy llm_retries / llm_backoff_ms options are still accepted
for back-compat and will be removed in v0.9; the lint rule
deprecated_llm_options warns on usage. See
Composable callers and middleware for the
full middleware catalog.
Agent-loop compaction
agent_loop compacts prompt-visible transcript messages before an LLM call
would exceed the configured or discovered context budget. The default policy is
hybrid: keep the system prompt and the last 10 prompt-visible messages
verbatim, summarize older messages, and fall back to truncation if the summary
still exceeds the hard limit.
let result = agent_loop(task, system, {
provider: "openai",
model: "gpt-4o",
compaction: {strategy: "hybrid", keep_last_n: 10}
})
Available strategies:
| Strategy | Behavior |
|---|---|
"none" / false | Disable automatic agent-loop transcript compaction |
"truncate" | Replace older messages with a deterministic abbreviated summary |
"summarize_middle" | Summarize older messages and keep the latest suffix verbatim |
"summarize_all" | Summarize all compactable prompt-visible messages |
"hybrid" | Summarize older messages, keep the latest suffix, and use truncate as the hard-limit fallback |
Compaction emits TranscriptCompacted live events and transcript
compaction events with reason, strategy, engine_strategy,
estimated_tokens_before, estimated_tokens_after, instruction_mode,
instruction_source, and compaction_policy, so replay tools can verify which
trigger, policy, and host/user instruction lane ran.
Hosts can attach first-class compaction instructions without building custom
prompt concatenation. The typed CompactionPolicy shape accepts
instructions, mode, scope, preserve, drop,
extend_default_instructions, and author. Omitting
extend_default_instructions or setting it to true appends host/user guidance
after Harn's default compaction rules; false replaces the default guidance.
Host-only instructions stay in event and audit metadata and are not copied into
the next model-visible summary unless scope is "model_visible",
"summary", or "transcript".
import {compact_for_bug_fix_resumption} from "std/agent/autocompact"
let result = agent_loop(task, system, {
provider: "mock",
compact_threshold: 1,
compact_strategy: "custom",
auto_compact: {
policy: compact_for_bug_fix_resumption({author: "host"})
},
compact_callback: { archived, _reminders, policy ->
{summary: "resume with " + policy.mode + " over " + to_string(len(archived)) + " messages"}
},
})
Stdlib helpers cover common host commands:
compaction_policy(...), compact_for_bug_fix_resumption(...),
compact_preserving_test_failures(...), and
compact_retaining_current_plan(...).
| Profile | max_iterations | max_nudges | tool_retries | llm_retries | schema_retries |
|---|---|---|---|---|---|
tool_using | 50 | 8 | 0 | 2 | 0 |
researcher | 30 | 4 | 0 | 2 | 0 |
verifier | 5 | 0 | 0 | 2 | 3 |
completer | 1 | 0 | 0 | 2 | 0 |
Adaptive iteration budget
Plain integer max_iterations is a hard cap. iteration_budget lets the loop
start with a small initial limit and extend it transparently when there is
evidence of forward progress, instead of forcing harness authors to guess a
single number.
let result = agent_loop(prompt, system, {
iteration_budget: {mode: "adaptive", initial: 4, max: 16, extend_by: 2},
})
Fields:
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | "fixed" | "fixed" (no extension) or "adaptive" |
initial | int | max / 4 (adaptive), max (fixed) | Iteration cap to start with |
max | int | 16 (adaptive), 50 (fixed) | Hard upper bound; extensions never raise the cap above this |
extend_by | int | 2 (adaptive), 0 (fixed) | Default extension delta when policy returns {action: "extend"} without by / until |
expose_decisions | bool | mode == "adaptive" | When true, the result includes an adaptive_budget summary with the decision log |
max_iterations: N and iteration_budget: {mode: "fixed", initial: N, max: N}
are equivalent. Passing both iteration_budget and max_iterations is allowed;
the budget's max wins for the host's autonomy/ACP tracking.
Explicit max_iterations, initial, max, and adaptive extend_by values
must be positive integers, and initial must be less than or equal to max;
invalid fields raise an agent_loop error before the first provider call.
loop_control policy
When the budget is "adaptive" (or any time you set loop_control), the loop
calls a policy closure once per iteration with a normalized state snapshot and
applies the returned command:
loop_control: { state ->
if state.budget.remaining > 1 {
return nil
}
if state.completion.vetoed {
return {action: "extend", by: 2, reason: "completion gate vetoed"}
}
if state.turn.tool_call_count > 0 && state.progress.changed {
return {action: "extend", by: 2, reason: "recent turn made progress"}
}
return nil
}
State snapshot fields:
| Field | Description |
|---|---|
iteration | 1-based iteration just completed |
budget.current_limit | Active iteration cap before this decision |
budget.max | Configured upper bound |
budget.remaining | current_limit - iteration; 0 means the next iteration would exceed the cap |
budget.extension_count | Number of prior extensions applied |
turn.tool_call_count | Tool calls executed this turn |
turn.successful_tool_names / turn.rejected_tool_names | Names from this turn's dispatch |
turn.text_chars | Visible-text length this turn |
turn.native_fallback_used | True when native_tool_fallback accepted text-mode tool calls this turn |
session.successful_tool_names / session.rejected_tool_names | Cumulative deduplicated tool name sets |
session.required_tools_satisfied / session.required_tools_missing | require_successful_tools postcondition status |
completion.proposed | True when post-turn logic proposed a natural / sentinel break this turn |
completion.vetoed | True when verify_completion / verify_completion_judge / done_judge vetoed |
completion.verdict / completion.feedback | Judge verdict and feedback string when present |
progress.changed | True if this turn made tool calls, produced new successful tool names, or wrote visible text |
progress.summary | Human-readable progress hint ("executed N tool call(s)", "completion gate vetoed", etc.) |
Return value is one of:
| Command | Meaning |
|---|---|
nil / {action: "none"} | No-op; loop continues until the next decision |
{action: "extend", by: N, reason} | Raise current_limit by N (capped by budget.max) |
{action: "extend", until: M, reason} | Raise current_limit to M (capped by budget.max) |
{action: "stop", status: "incomplete", reason} | Break the loop with the given final status |
When no loop_control is provided and the budget is adaptive, the stdlib
installs a small default policy that extends only when one of these is true at
the cap edge:
- the latest verify/done judge vetoed completion,
require_successful_toolsis unsatisfied, or- the most recent turn executed at least one tool call (i.e. real progress).
All decisions are recorded:
- in the result under
adaptive_budget.decisions(whenexpose_decisionsis true, which is the default for adaptive budgets), and - as
LoopControlDecisionevents on the live event stream ({type: "loop_control_decision", action, oldLimit, newLimit, reason, ...}), also surfaced to ACP/A2A bridges.
let result = agent_loop(prompt, system, {iteration_budget: {mode: "adaptive", initial: 4, max: 12}})
log(result.adaptive_budget.extensions_used)
log(result.adaptive_budget.final_limit)
for decision in result.adaptive_budget.decisions {
log(decision.action + ": " + decision.reason)
}
Generic role presets
std/agent/presets packages the common harness shapes — audit, repair,
summary, verify — so script authors don't have to hand-tune max_iterations,
max_nudges, done_sentinel, done_judge, turn_policy, and thinking on
every call. Presets compose with agent_loop: they return ordinary options
dicts (caller overrides always win) and the *_agent helpers call
agent_loop directly.
import {audit_agent, repair_agent, summary_agent, agent_preset} from "std/agent/presets"
// Inspect / read-only audit. Native completion, no done sentinel,
// adaptive budget {initial: 4, max: 12}, max_nudges: 1.
let audit = audit_agent("Audit the release", {
provider: "anthropic",
model: "claude-opus-4-7",
tools: release_tools,
require_successful_tools: ["release_run"],
})
// Tool-using repair. Wider budget {initial: 4, max: 16}, max_nudges: 2.
let repair = repair_agent("Fix the regression", {
provider: "openai",
model: "o3",
tools: repair_tools,
})
// Cheap one-shot summary. tool_choice="none", iteration_budget fixed at 1.
let summary = summary_agent("Summarize the audit findings.", {
provider: "openai",
model: "gpt-4o-mini",
})
// Customize a preset before passing to agent_loop directly:
let opts = agent_preset("repair", {
tools: repair_tools,
iteration_budget: {mode: "adaptive", initial: 6, max: 20},
})
let result = agent_loop(prompt, system, opts)
Preset roles, defaults summarized:
| Preset | profile | tool_format | loop_until_done | max_nudges | Default iteration_budget | stall_diagnostics | done_sentinel / done_judge |
|---|---|---|---|---|---|---|---|
audit | verifier | native | true | 1 | adaptive {initial: 4, max: 12, extend_by: 2} | enabled, threshold 3 | both nil (natural completion) |
repair | tool_using | native | true | 2 | adaptive {initial: 4, max: 16, extend_by: 2} | enabled, threshold 3 | both nil |
summary | completer | unset | false | 0 | fixed {initial: 1, max: 1} | unset | both nil, tool_choice: "none" |
verify | verifier | unset | false | 0 | adaptive {initial: 1, max: 5, extend_by: 1} | unset | done_judge: true |
merge_captain | tool_using | native | true | 3 | adaptive {initial: 8, max: 60, extend_by: 4} | enabled, threshold 3 | both nil; default consent denies writes |
review_captain | tool_using | native | true | 3 | adaptive {initial: 6, max: 30, extend_by: 3} | enabled, threshold 3 | both nil |
oncall_captain | tool_using | native | true | 3 | adaptive {initial: 6, max: 24, extend_by: 3} | enabled, threshold 3 | both nil; default with_rate_limit({max_calls: 50}) |
release_captain | tool_using | native | true | 3 | adaptive {initial: 8, max: 40, extend_by: 4} | enabled, threshold 3 | both nil; opt-in with_dry_run shadow runs |
Captain presets
The four captain presets package the persona-shaped service contracts
adopters were re-deriving by hand: a long-enough adaptive budget, a
HITL-friendly consent gate where it matters, a default rate-limit
where unbounded fan-out would hurt, and the canonical cheap-default /
frontier-escalation routing scaffolding from std/llm/handlers. They
are the substrate the persona template pack (harn#463) ships entries
on top of.
import {
merge_captain_agent,
oncall_captain_agent,
release_captain_agent,
review_captain_agent,
} from "std/agent/presets"
// Merge Captain: long adaptive budget; default consent layer auto-
// approves tools annotated `read`/`search`/`fetch`/`think` and denies
// everything else unless the caller passes a `consent` callable.
let sweep = merge_captain_agent("Sweep open PRs.", {
provider: "anthropic",
model: "claude-opus-4-7",
tools: github_tools,
consent: { call -> approval_bridge.prompt(call) }, // HITL bridge
audit_sink: { record -> receipts.append(record) }, // captain ledger
})
// Oncall Captain: defaults `with_rate_limit({max_calls: 50})` so an
// alert-storm loop can't fan out unbounded. Override via `rate_limit`.
let triaged = oncall_captain_agent("Triage paging alerts.", {
provider: "openai",
model: "gpt-5.4",
tools: oncall_tools,
rate_limit: {max_calls: 100, message: "alert-loop cap"},
})
// Release Captain: long checkpointed budget; pass `dry_run: true` (or
// a `with_dry_run` opts dict) to layer a shadow-run gate.
let shipping = release_captain_agent("Cut v0.9.0.", {
provider: "anthropic",
model: "claude-opus-4-7",
tools: release_tools,
dry_run: true,
cheap_caller: cheap_default_caller,
frontier_caller: frontier_caller,
escalate_predicate: { call -> call?.opts?.task_kind == "judge" },
logging_sink: { record -> receipts.llm_call(record) },
})
Captain layers are opt-in: the preset only adds an audit_sink /
telemetry / consent / rate_limit / dry_run /
handoff_sink layer when the caller supplies the matching dependency
(plus the per-captain defaults in the table above). Pass an explicit
tool_caller to opt out completely. Captain presets also build an
llm_caller from cheap_caller + frontier_caller +
escalate_predicate (cheap-by-default with frontier escalation, per
the cost-moat substrate) and an optional logging_sink for receipts.
Each preset also installs a provider-aware thinking choice when the caller
hasn't already set one:
- adaptive thinking on models that advertise
"adaptive"inthinking_modes, effort(medium for audit/repair, low for verify) on models withreasoning_effort_supported,disabledforsummarywhen supported,- otherwise leaves
thinkingunset.
Caller-supplied thinking and iteration_budget always win. Sugar:
iteration_budget: "adaptive" keeps the preset's numeric defaults and
explicitly switches the mode to adaptive.
agent_budget(kind_or_options, overrides?) returns a budget shape suitable for
embedding directly:
{iteration_budget: agent_budget("repair", {max: 24})}
{iteration_budget: agent_budget("adaptive", {initial: 2, max: 6})}
When daemon: true, the loop transitions active -> idle -> active instead of
terminating on a text-only turn. Idle daemons can be woken by queued human
messages, agent/resume bridge notifications, wake_interval_ms, or watched
file changes from watch_paths.
Daemon idle is a special case of agent_await_resumption: the stdlib records
the same lifecycle audit and normalized ResumeConditions metadata, with
timeout / on_event preconfigured from daemon options, while keeping the
existing in-process idle wait so daemon-mode return behavior is unchanged.
For MCP server tool catalogs, see MCP server tools.
Native-tool stages also expose structured fallback / retry metadata in the
result trace summary. Look for native_text_tool_fallbacks,
native_text_tool_fallback_rejections, and empty_completion_retries when
debugging provider contract drift or OpenAI-compatible empty completions.
Default nudge message:
The nudge is mode-aware: In tagged text-tool stages it asks for concrete tool progress and reserves
<done>##DONE##</done>for real completion. In native-tool stages it asks for concrete tool progress and treats final text with no tool calls as completion. In no-tool sentinel stages it asks for concrete progress and reserves bare##DONE##for completion.
When loop_until_done: true, the system prompt is automatically extended with:
IMPORTANT: You MUST keep working until the task is complete. The completion instruction is mode-aware: native-tool stages complete by returning final text with no tool calls, tagged text-tool stages use
<done>##DONE##</done>, and no-tool sentinel stages use bare##DONE##.
done_judge adds a second gate after completion is detected. The loop renders
the transcript for a structured judge call and expects
verdict: "done" | "continue" plus optional reasoning and next_step.
A veto injects runtime feedback and the loop continues until the judge accepts,
done_judge.max_invocations is reached, or max_verify_attempts is exhausted.
Every judge call emits JudgeDecision
with session_id, iteration, verdict, reasoning, next_step, and
judge_duration_ms, plus optional trigger.
Set top-level done_judge.max_invocations (alias max_feedback) to a positive
integer to cap repeated done-judge vetoes. Once reached, the loop stops with
status: "verify_capped" and stop_reason: "done_judge_cap_reached", and the
result carries {done_judge: {invocations, vetoes, max_invocations, cap_reached}}. Set it to 0 to disable the terminal cap.
Use done_judge.cadence to gate the judge. Omit it to preserve the default:
every completion candidate is judged. every: N judges turns N, 2N, and so
on; max_invocations caps total done-judge calls; min_iterations_before_first
skips the first K turns; when accepts "always", "stalled", or a closure
that receives the same loop-state shape as loop_control.
When when: "stalled" is configured, a stall warning can fire the judge
directly. A done verdict stops the loop with stalled_done_judge before the
repeated tool call is dispatched; a continue verdict leaves the existing
stall feedback injection path in place. The corresponding JudgeDecision
event carries trigger: "stalled".
agent_loop(task, system, {
loop_until_done: true,
done_judge: {
cadence: {every: 5, when: "always", max_invocations: 3},
},
})
when: "stalled" does not fire on ordinary completion candidates. It is the
policy hook used by stall diagnostics so completion checks happen on a signal
rather than a fixed "are you done?" prompt.
Editing source from inside an agent loop
Agent loops that mutate code should reach for the AST-precise primitives in
std/edit before freeform text patches. The cookbook
chapter Precise edits with AST tools
walks through the decision tree (replace → edit_apply_node, add →
edit_insert_at_anchor, rename → edit_rename_symbol, preview →
edit_dry_run, text fallback → edit_safe_text_patch) and ships a
system_reminder snippet you can lift into agent_loop's session-start
hook so the model carries the same decision tree into every coding turn.
Daemon stdlib wrappers
When you want a first-class daemon handle instead of wiring agent_loop
options manually, use the daemon builtins:
daemon_spawn(config)daemon_trigger(handle, event)daemon_snapshot(handle)daemon_stop(handle)daemon_resume(path)
daemon_spawn accepts the same daemon-related options that agent_loop
understands (wake_interval_ms, watch_paths, idle_watchdog_attempts,
etc.) plus event_queue_capacity, which bounds the durable FIFO trigger queue
used by daemon_trigger.
let daemon = daemon_spawn({
name: "reviewer",
task: "Watch for trigger events and summarize the latest change.",
system: "You are a careful reviewer.",
provider: "mock",
persist_path: ".harn/daemons/reviewer",
event_queue_capacity: 256,
})
daemon_trigger(daemon, {kind: "file_changed", path: "src/lib.rs"})
let snap = daemon_snapshot(daemon)
log(snap.pending_event_count)
daemon_stop(daemon)
let resumed = daemon_resume(".harn/daemons/reviewer")
These wrappers preserve queued trigger events across stop/resume. If a daemon is stopped while a trigger is mid-flight, that trigger is re-queued and replayed on resume instead of being lost.
Context callback
context_callback lets you keep the full recorded transcript for replay and
debugging while showing the model a smaller or rewritten prompt-visible
history on each turn.
The callback receives one argument:
{
iteration: int,
system: string?,
messages: list,
visible_messages: list,
recorded_messages: list,
recent_visible_messages: list,
recent_recorded_messages: list,
latest_visible_user_message: string?,
latest_visible_assistant_message: string?,
latest_recorded_user_message: string?,
latest_recorded_assistant_message: string?,
latest_tool_result: string?,
latest_recorded_tool_result: string?
}
It may return:
nilto leave the current prompt-visible context unchanged- a
listof messages to use as the next prompt-visible message list - a
dictwith optionalmessagesandsystemfields
Example: hide older assistant messages so the model mostly sees user intent, tool results, and the latest assistant turn.
fn hide_old_assistant_turns(ctx) {
var kept = []
var latest_assistant = nil
for msg in ctx.visible_messages {
if msg?.role == "assistant" {
latest_assistant = msg
} else {
kept = kept + [msg]
}
}
if latest_assistant != nil {
kept = kept + [latest_assistant]
}
return {messages: kept}
}
let result = agent_loop(task, "You are a coding assistant.", {
loop_until_done: true,
context_callback: hide_old_assistant_turns
})
Post-turn callback
post_turn_callback runs after a tool-calling turn completes. Use it when the
workflow should react to the tool outcomes directly instead of waiting for the
model to emit another message.
The callback receives:
{
session_id: string,
iteration: int,
has_tool_calls: bool,
dispatch: list | dict | nil,
tool_results: list,
tool_count: int,
successful_tool_names: list,
rejected_tool_names: list,
session_successful_tools: list,
session_rejected_tools: list,
text: string,
visible_text: string,
}
Each tool_results entry has:
{tool_name: string, ok: bool, status: string, rendered_result: string, error: string?}
It may return:
- a
stringto inject as the next user-visible message - a
boolwheretruestops the current stage immediately after the turn - a
dictwith optionalmessage,stop,stop_reason,next_options, andllm_optionsfields.messageis injected as runtime feedback.stop_reasonoverrides the default"post_turn_stop"reason whenstopis true.next_optionsmerges into the next loop iteration's options;llm_optionsmerges into the next LLM call'sllm_optionsdict.
Example: after a required read succeeds, ask the model to synthesize the final answer with no more native tool calls:
fn finalize_after_read(turn) {
if turn?.session_successful_tools?.contains("read_file") {
return {
message: "You have the required file evidence. Produce the final answer now.",
llm_options: {tool_choice: "none"},
}
}
return ""
}
Example with retry
retry 3 {
let result = agent_loop(
task,
"You are a coding assistant.",
{
loop_until_done: true,
max_iterations: 30,
max_nudges: 5,
provider: "anthropic",
model: "claude-sonnet-4-6"
}
)
log(result.text)
}
Skills lifecycle
Skills bundle metadata, a system-prompt fragment, scoped tools, and
lifecycle hooks into a typed unit. Declare them with the top-level
skill NAME { ... } language form (see the Harn spec)
or the imperative skill_define(...) builtin, then pass the resulting
skill_registry to agent_loop via the skills: option. The agent
loop matches, activates, and (optionally) deactivates skills across
turns automatically.
Matching strategies
skill_match: { strategy: ..., top_n: 1, sticky: true } controls how
the loop picks which skill(s) to activate:
"metadata"(default) — in-VM BM25-ish scoring overdescription+when_to_usecombined with glob matching against thepaths:list. Name-in-prompt mentions count as a strong boost. No host round-trip, so matching is fast and deterministic."host"— delegates scoring to the host via theskill/matchbridge RPC (see bridge-protocol.md). Useful for embedding-based or LLM-driven matchers. Failing RPC falls back to metadata scoring with a warning."embedding"— alias for"host"; accepted so the language matches Anthropic's canonical terminology.
Activation lifecycle
- Match runs at the head of iteration 0 (always) and, when
sticky: false, before every subsequent iteration (reassess). - Activate: the skill's
on_activateclosure (if any) is called, itspromptbody is woven into the effective system prompt, andallowed_toolsnarrows the tool surface for the next LLM call. Each activation emitsAgentEvent::SkillActivated+ askill_activatedtranscript event with the match score and reason. - Deactivate (only in
sticky: falsemode) — when reassess picks a different top-N, the previously-active skill'son_deactivateruns and the scoped tool filter is dropped. EmitsAgentEvent::SkillDeactivated+ askill_deactivatedtranscript event. - Session resume: when
session_id:is set, the set of active skills at the end of one run is persisted in the session store. The nextagent_loopcall on the same session rehydrates them before iteration-0 matching runs, so sticky re-entry stays hot without re-matching from a cold prompt. - JSONL seeding:
agent_session_seed_from_jsonl(path, opts?)creates a new session from anllm_transcript.jsonlsidecar. It imports exact prompt-visiblemessageevents or older full request snapshots, optionally checksprovider/model, and supportstruncate_to_lastplusdrop_tool_callsfor oversized histories. Provider-response-only sidecars requirevalidate: falsebecause they lack user and tool-result turns.
Scoped tools
A skill's allowed_tools list is the union across all active
skills; any tool outside that union is filtered out of both the
contract prompt and the native tool schemas the provider sees.
Runtime-internal tools like __harn_tool_search are never filtered
— scoping gates the user-declared surface, not the runtime's own
scaffolding.
Frontmatter honoured by the runtime
| Field | Type | Effect |
|---|---|---|
description | string | Primary ranking signal for metadata matching |
when_to_use | string | Secondary ranking signal |
paths | list<string> | Glob patterns for paths: auto-trigger |
allowed_tools | list<string> | Whitelist applied to the tool surface on activation |
prompt | string | Body woven into the active-skill system-prompt block |
disable-model-invocation | bool | When true, the matcher skips the skill entirely |
user-invocable | bool | Placeholder for host UI (not consumed by the runtime today) |
mcp | list<string> | MCP servers the skill wants booted (consumed by host integrations) |
on_activate / on_deactivate | fn | Closures invoked on transition |
Example
skill ship {
description "Ship a production release"
when_to_use "User says ship/release/deploy"
paths ["infra/**", "Dockerfile"]
allowed_tools ["deploy_service"]
prompt "Follow the deploy runbook. One command at a time."
}
let result = agent_loop(
"Ship the new release to production",
"You are a staff deploy engineer.",
{
provider: "anthropic",
tools: tools(),
skills: ship,
working_files: ["infra/terraform/cluster.tf"],
}
)
The loop emits one skill_matched event per match pass (including
zero-candidate passes so replayers see the boundary), one
skill_activated per activated skill, and one skill_scope_tools
event per activation whose allowed_tools narrowed the surface. When
tool_surface_narrowing removes unused tools between turns, the loop
also emits skill_narrow with removed_tools, remaining_tools, the
narrowing reason, policy details, removed-tool details, and kept-tool
details.
The default narrowing policy is safe by class: only tools classified as
read_only are prunable. Tools classified as mutating, approval,
session_control, progress, or result_polling remain visible even after
long discovery windows, and host/custom tools with missing annotations are
classified as unknown and kept. Host surfaces should annotate each tool with
annotations.side_effect_level (none, read_only, workspace_write,
process_exec, or network) plus a kind such as read, search, edit,
or execute. Use tool_surface_narrowing: {mode: "aggressive"} only when a
session intentionally wants usage-only pruning across all classes; callers can
still override prune_classes, keep_classes, unknown_tool_policy, and
hard_keep for narrower policies.
Delegated workers
For long-running or parallel orchestration, Harn exposes a worker/task lifecycle directly in the runtime.
let worker = spawn_agent({
name: "research-pass",
task: "Draft a summary",
node: {
kind: "subagent",
mode: "llm",
model_policy: {provider: "mock"},
output_contract: {output_kinds: ["summary"]}
}
})
let done = wait_agent(worker)
log(done.status)
spawn_agent(...) accepts either:
- a
graphplus optionalartifactsandoptions, which runs a typed workflow in the background, or - a
nodeplus optionalartifactsandtranscript, which runs a single delegated stage and preserves transcript continuity acrosssend_input(...)
Worker configs may also include policy to narrow the delegated worker to a
subset of the parent's current execution ceiling, or a top-level
tools: ["name", ...] shorthand:
let worker = spawn_agent({
task: "Read project files only",
tools: ["read", "search"],
node: {
kind: "subagent",
mode: "llm",
model_policy: {provider: "mock"},
tools: repo_tools()
}
})
If neither is provided, the worker inherits the current execution policy as-is.
If either is provided, Harn intersects the requested worker scope with the
parent ceiling before the worker starts or is resumed. Permission denials are
returned to the agent loop as structured tool results:
{error: "permission_denied", tool, reason}.
Worker options.resume_when accepts the shared ResumeConditions shape used by
self-parking agents: optional trigger, timeout, and on_event fields.
parse_resume_conditions(...) validates that shape without spawning a worker;
trigger is checked by the same std/triggers trigger-spec parser used by
trigger_register(...), while invalid fields raise HARN-SUS-002 with the
failing field path.
Worker lifecycle builtins:
| Function | Description |
|---|---|
spawn_agent(config) | Start a worker from a workflow graph or delegated stage |
sub_agent_request(task, options?) | Build the normalized child-agent request used by sub_agent_run |
sub_agent_run(task, options?) | Run an isolated child agent loop and return a single clean result envelope to the parent |
agent_lifecycle_tools(registry?, options?) | Add model-facing lifecycle tools to a registry |
send_input(handle, task) | Re-run a completed worker with a new task, carrying transcript/artifacts forward when applicable |
suspend_agent(worker, reason?, options?) | Cooperatively suspend a worker, persist a resumable snapshot, and return status: "suspended" with suspension metadata |
resume_agent(worker_or_snapshot, resume_input?, continue_transcript?) | Resume a suspended worker, optionally with new input; set continue_transcript=false to resume from the prior summary plus new input only |
agent_stop(worker, options?) | Stop a worker. {graceful: true} returns a normalized handoff artifact plus recursively folded child handoffs before emitting WorkerStopped; omitted or false preserves hard cancel |
parse_resume_conditions(conditions?) | Validate trigger, timeout, and on_event resume conditions for self-park and spawn_agent({options: {resume_when}}) |
agent_await_resumption(reason, conditions?) | Normalize the lifecycle-tool request used by agent_loop and daemon idle; agent_loop performs the actual suspension when the model calls the tool |
wait_agent(handle_or_list) | Wait for one worker or a list of workers to finish |
close_agent(handle) | Cancel a worker and mark it terminal |
list_agents() | Return summaries for all known workers in the current runtime |
Agent Lifecycle Tools
agent_loop(...) automatically exposes agent_await_resumption as a model tool.
When an agent is running as a worker, that tool is structural: the loop validates
optional conditions with parse_resume_conditions(...), calls the same suspend
path as suspend_agent(...), returns status: "suspended" to the parent, and
does not dispatch the tool as an ordinary handler result.
Top-level loops use the same result shape. If a root agent_loop(...) parks,
Harn persists a resumable worker snapshot and returns
{status: "suspended", handle, reason, initiator: "self", ...} to the direct
caller. The CLI can cold-restore that snapshot with:
harn run --resume .harn/workers/worker_...json
Parent-side lifecycle control is opt-in. Pass subagents: true or
subagent_tools: true in agent_loop options, or call
agent_lifecycle_tools(registry, {subagents: true}), to add
subagent_pause(handle, reason) and
subagent_resume(handle, input?, continue_transcript? = true), and
subagent_stop(handle, graceful? = true, reason?). Graceful stop returns
{status: "stopped", handoff, children, handoffs, worker} for parent takeover;
graceful: false keeps the old hard-cancel behavior.
sub_agent_run
Use sub_agent_run(...) when you want a full child agent_loop with its own
session and narrowed capability scope, but you do not want the child transcript
to spill into the parent conversation history. sub_agent_request(...) exposes
the Harn-authored request normalization when callers need to inspect the tool
selection and child options before execution.
let result = sub_agent_run("Find the config entrypoints.", {
provider: "mock",
tools: repo_tools(),
allowed_tools: ["search", "read"],
token_budget: 1200,
returns: {
schema: {
type: "object",
properties: {
paths: {type: "array", items: {type: "string"}}
},
required: ["paths"]
}
}
})
if result.ok {
log(result.data.paths)
} else {
log(result.error.category)
}
The parent transcript only records the outer tool call and tool result. The
child keeps its own session and transcript, linked by session_id / parent
lineage metadata.
Pending parent system_reminder events are filtered into the child handoff
before the child loop starts. propagate: "all" reminders continue through
descendant sub-agents, propagate: "session" reaches direct children only, and
propagate: "none" remains local to the parent. Inherited reminders appear in
the child transcript with source: "inherited" and originating_agent_id.
sub_agent_run(...) returns an envelope with:
oksummaryartifactsevidence_addedtokens_usedbudget_exceededsession_idtranscriptdatawhen the child requests JSON mode orreturns.schemasucceedserror: {category, message, tool?}when the child fails or a narrowed tool policy rejects a call
agent_loop(...), sub_agent_run(...), and spawn_agent(...) accept
approval_policy for declarative allow/ask/deny gating before a tool runs.
Use approval_policy.rules for typed matching over tool name/kind,
side-effect level, declared paths, commands, URLs/domains/methods, MCP identity,
agent/persona/mode, capability operation, and repeated-call counts. Deny wins
over ask, ask wins over allow, and unmatched tools are approved. Active
approval policies deny sensitive filenames such as .env and private keys by
default, and declared host-absolute paths outside the workspace require an
explicit external_roots allowance. Ask decisions call
session/request_permission; the host request and the transcript event both
carry a policyDecision receipt with matched rule and rationale.
agent_loop(...), sub_agent_run(...), and spawn_agent(...) also accept a
permissions dict for per-agent dynamic policy. allow and deny entries can
be tool-name glob lists, argument pattern lists, or Harn predicates over the tool
args. For path-bearing tools, std/tools.path_scope(...) returns a matcher that
checks configured path argument keys against the active session
workspace_anchor; mounted roots can be filtered by mount_modes (for example,
["extend"] for writable roots). on_escalation receives a
PermissionRequest and may return {grant: "once"}, {grant: "session"},
true, or false. Permission decisions are recorded as PermissionGrant,
PermissionDeny, and PermissionEscalation transcript events, while parent
policy ceilings still intersect with child declarations.
Set background: true to get a normal worker handle back instead of waiting
inline. The resulting worker uses mode: "sub_agent" and can be resumed with
wait_agent(...), send_input(...), and close_agent(...).
Background handles retain the original structured request plus a normalized
provenance object, so parent pipelines can recover child questions, actions,
workflow stages, and verification steps directly from the handle/result.
Workers can persist state and child run paths between sessions. Use carry
inside spawn_agent(...) when you want continuation to reset transcript state,
drop carried artifacts, or disable workflow resume against the previous child
run record. Worker configs may also include execution to pin delegated work
to an explicit cwd/env overlay or a managed git worktree:
carry.transcript_mode is explicit and accepts:
inherit(default): pass the completed worker transcript into the nextsend_input(...)/ trigger cycle.fork: start the next cycle from a copy of the completed transcript with a fresh transcript id andmetadata.parent_transcript_idpointing at the source transcript.reset: start the next cycle with no carried transcript.compact: compact the completed worker transcript before it is persisted and inherited by the next cycle.
Worker result artifacts are parent-facing summaries. Their data.payload
omits bulky nested transcript and artifacts fields by default while keeping
the worker request, provenance, execution profile, result text/status, and
produced artifact ids available for routing and audit.
let worker = spawn_agent({
task: "Run the repo-local verification pass",
graph: some_graph,
carry: {transcript_mode: "compact", artifact_mode: "inherit"},
execution: {
worktree: {
repo: ".",
branch: "worker/research-pass",
cleanup: "preserve"
}
}
})