Composable tool middleware
Harn's agent_loop exposes two composable seams for tool calls —
mirrors of the llm_caller seam at the model boundary, but operating
on tool execution. They let harness authors transform tools and the
calls against them without forking individual tool definitions or
patching the runtime.
| Seam | Stage | Hook | Use cases |
|---|---|---|---|
| Schema-time | Before the model sees a registry | tools_use_middleware(registry, transform) | Augment input schemas (force a reason arg, inject a dry_run flag), drop tools, rewrite descriptions. |
| Execution-time | Around every tool dispatch | agent_loop({tool_caller: caller}) | Audit logs, consent prompts, dry-run preview, redaction, idempotency, rate-limit, telemetry. |
Both seams compose. The execution-time seam runs every dispatch
regardless of executor (harn, host_bridge, mcp_server,
provider_native), so a single with_audit_log reaches script tools,
host-bridge tools (e.g. burin-code), and MCP-served tools uniformly.
The user-facing module is std/llm/tool_middleware.
Why this exists
A motivating use case: force every tool call to provide a reason
parameter explaining why it's invoking the tool. The harness
benefits in three ways:
- User-facing summary. Hosts render "Searched codebase to find rate limiter middleware" instead of a generic "ran 3 tools" counter.
- Better model reasoning. Forcing the model to articulate intent often improves quality on complex agentic tasks.
- Audit trail. Every tool call carries a structured "why" that downstream observability tools (Langfuse, Datadog, Honeycomb) can index.
The same primitive — wrap tool calls — supports consent, dry-run, rate-limit, redaction, idempotency, and telemetry middleware. Every useful middleware someone writes becomes a building block others can stack.
Caller contract
The execution-time middleware closure has signature:
fn(call, next) -> result_dict
// call = {
// tool_name: string,
// tool_args: dict,
// call_id: string,
// declared_executor: "harn" | "host_bridge" | "mcp_server" |
// "provider_native" | nil,
// schema: dict | nil, // input parameters JSON schema
// description: string, // tool description
// turn: {iteration: int, session_id: string},
// }
// next: fn(call) -> result_dict // bottom of the stack runs
// // the runtime default dispatch
Each layer can:
- Inspect / observe the inbound call and outbound result.
- Mutate args: call
next(call + {tool_args: rewritten}). - Short-circuit: return a result dict without calling
next. - Augment audit data: attach an
auditkey to the result dict; the runtime fans it out as atool_call_auditAgentEvent.
The result dict mirrors the dispatch result shape:
{
ok, status, tool_name, tool_call_id, arguments,
result, rendered_result, observation,
error, error_category, executor, approval, execution_duration_ms,
audit?,
}
The audit field convention
Middleware-attached audit metadata is intentionally free-form JSON, but the field names align with prevailing specs where they exist:
| Field | Source | Meaning |
|---|---|---|
summary? | ACP title, OpenAI Responses summary_text | User-facing one-liner ("Searched codebase…") |
description? | OTel gen_ai.tool.description, LangChain | Longer free-form rationale |
kind? | ACP ToolCall.kind | One of read/edit/delete/move/search/execute/think/fetch/other |
hints? | MCP tool annotations | {read_only?, destructive?, idempotent?, open_world?} |
consent? | (coined; ACP/MCP keep this off the call object) | {decision, decided_by, decided_at} |
scope? | (coined; mirrors PersonaRuntimeBinding.stages) | {stage, allowed_tools?, side_effect_level?} — the scoped capability surface this call ran under |
layers? | (coined) | [{name, status, started_at, ended_at, error?}] per-layer audit log |
metadata? | A2A metadata, LangChain | Free-form extension slot |
These names are conventions, not requirements — middleware authors are free to invent new keys. Use the conventional names where they fit so that bridge-out / OTel exporters / ACP gateway adapters work cleanly.
Reserved status values
When a layer short-circuits, prefer one of these status values so
composition stays predictable:
"ok", "tool_not_found", "schema_violation", "consent_denied",
"policy_blocked", "scope_violation", "executor_error",
"redacted", "dry_run", "rate_limited", "exception",
"tool_middleware_exception".
Bundled middleware
std/llm/tool_middleware ships the following middleware. Each is
~10-30 lines of Harn — they're examples as much as they're useful in
their own right.
with_required_reason(opts?) -> {schema_transform, caller}
The originating use case. Returns a paired schema decorator + execution
caller. By default it forces every tool call to provide a non-empty
reason (or a custom-named field), strips it before delegating to
next, and surfaces it on audit.summary.
Options: field (default "reason"), description, strip (bool,
default true), audit_key (default "summary"), min_length (default
1), on_missing ("reject" (default) or "fill_blank"), and
schema_required (bool, default true).
Set schema_required: false with on_missing: "fill_blank" when a host
wants the audit field advertised but cannot trust provider-native tool
schema enforcement. This keeps the real tool arguments intact even when
the model omits the synthetic reason field, while still recording
"(no reason given)" in the audit summary.
let mw = with_required_reason({schema_required: false})
let registry = tools_use_middleware(my_registry, mw.schema_transform)
agent_loop(task, system, {tools: registry, tool_caller: mw.caller})
schema_required: false keeps runtime validation aligned with the
default strip: true; the middleware still rejects calls that omit the
reason before the tool handler runs.
with_audit_log(sink_or_options) -> caller
Builds one typed ToolCallReceipt per tool call after the call
completes. Receipts include the required-reason summary, status,
executor, timing, model/provider, emitted batch position (emit_order),
audit metadata, and SHA-256 hashes of canonicalized args/results
instead of raw payloads.
sink_or_options accepts a callable sink, "local", "cloud",
"both", or {sink, redact}. Local receipts append to
.harn/receipts/<session_id>.jsonl; cloud receipts mirror through the
host event bridge; both writes local first and mirrors the same
receipt. redact is a list of argument keys removed before args_hash
is computed. In agent_loop({prefetch_next_turn: true}), local and
callable sinks flush in the background after the tool result has been
recorded; cloud receipt mirroring still remains attached to the tool
result event.
When the local sink is active, the middleware attaches a file://
receipt_uri to result.audit and to the typed receipt's embedded
audit dict so portal-style hosts can deep-link directly to the
persisted JSONL line.
let caller = compose_tool_callers([
with_audit_log({sink: "both", redact: ["token", "content"]}),
with_required_reason({schema_required: false}).caller,
])
For explicit file routing, local_receipt_sink(session_id) returns the
same append-only JSONL sink used by sink: "local".
with_consent(prompt_fn) -> caller
prompt_fn(call) -> bool | dict is consulted before each tool call.
Denied calls short-circuit with consent_denied; approved calls
proceed and record the decision in audit.consent. Pair with the host
UX (e.g. burin-code's approval modal) for destructive tools.
with_scoped_executor(opts) -> caller
Narrows the active CapabilityPolicy for the duration of one tool
dispatch. Wraps the downstream chain in with_execution_policy(...)
so the runtime's existing enforce_current_policy_for_tool machinery
(capability ceilings, side-effect ceiling, tool-arg constraints) sees
the scoped policy as the top of the stack — the scoped policy is
intersected with the ambient policy, so a stage can only tighten the
surface, never widen it. A preemptive tool-name check short-circuits
with status: "scope_violation" so the audit chain captures the stage
label even when the wrapped dispatch never runs.
Compose this outside binder / consent layers so it can reject before either does expensive work:
let caller = compose_tool_callers([
with_audit_log({...}),
with_scoped_executor({stage: "research", allowed_tools: ["search_files", "read_file"]}),
with_consent(prompt),
default_tool_caller(),
])
Options:
stage(default"scoped") — surfaced onaudit.scope.stageand in layer/error messages.allowed_tools(list of tool names; empty / missing skips the preemptive check and any tool-surface restriction).side_effect_level("none"/"read_only"/"workspace_write"/"process_exec"/"network"; tightens the ambient ceiling).capabilities(capability → operation allowlist; same shape asCapabilityPolicy.capabilities).on_violation("reject"(default) → short-circuit withscope_violationand a typed receipt;"raise"→ throw so the agent loop'stry { ... }can react).
Companion to per-stage persona declarations (PersonaRuntimeBinding.stages):
the persona runtime auto-installs stage policies at step boundaries,
while this middleware lets standalone tool callers narrow the surface
without declaring a full persona manifest.
with_dry_run(opts?) -> caller
Never invokes next — short-circuits with a synthetic OK result
tagged status: "dry_run". Useful for previewing a tool sequence
without side-effects. Options: only (whitelist) and except
(blacklist).
This is the userspace seam for crystallization shadow runs: when
shadow_replay_bundle (orchestration/crystallize) re-executes a
captured workflow to confirm a candidate's side-effect signature
hasn't drifted, wrap the tool caller in with_dry_run({except: [...known-pure-tools]}) so destructive ops are neutralized while
read-only ones still surface real results.
with_redaction(redactor) -> caller
Applied twice: once on inbound args, once on outbound result.
redactor({phase: "in"|"out", tool_name, args, result}) -> {args?, result?, redacted_fields?}. Records redacted-field names in
audit.metadata.
with_idempotency(key_fn, opts?) -> caller
Caches successful tool results keyed by key_fn(call) -> string,
backed by std/cache so the cache outlives the caller closure. Repeat
queries within the TTL reuse cached results.
with_rate_limit(opts) -> caller
Caps the total number of tool calls processed by this caller. Once
max_calls is hit, further calls short-circuit with rate_limited.
with_telemetry(sink_or_opts) -> caller
Emits a standardized tool-call span for every dispatch and fans it out
to one or more sinks. Accepts a callable fn(span), a built-in name
("langfuse", "otel", "stderr", "noop"), or a config dict with
sink: / sinks:. The span shape mirrors gen_ai.tool.* attributes
and exposes timing, executor, args hash, layered child spans, and
dispatched / result / scope-violation events. Full schema and built-in
sink reference: Tool-call spans.
with_summary(format_fn) -> caller
Generates a user-facing one-liner via format_fn(call, result) -> string,
populating audit.summary (the ACP/OpenAI convention slot).
with_handoff_artifact(opts?) -> caller
When a tool's result carries a handoff payload (under __handoff /
handoff by default, or a custom-detected key), normalizes it through
handoff(...) and surfaces the typed record on audit.handoff. The
optional sink(record, call) callback fires once per emitted handoff
for side-effecting persistence. Pairs with std/handoffs for the
typed handoff schema. If the payload includes policy_override, the
normalized audit record preserves it at audit.handoff.policy_override
so downstream handoff dispatch can run the target under that replacement
execution policy.
Options: sink (callback), detect (custom locator),
keys (extra result keys to inspect, default ["__handoff", "handoff"]),
source (override source_persona if the tool didn't set one),
strict (throw on malformed payloads, default false).
with_timeout(opts) -> caller
Caps wall-clock time per tool call. Calls inside the budget pass
through with audit.layers[…].status == "ok". Calls that breach the
budget surface error_category: "timeout" and status: "timeout" on
the layer log. The middleware does not cancel the in-flight dispatch
(hard cancellation belongs in agent_loop({deadline_ms})); it
observes the breach so upstream layers can react.
Options: max_ms (required, non-negative int), per_tool
({tool_name: override_max_ms}), message (override error message).
Composing
compose_tool_callers([outer, ..., inner]) returns one caller that
runs the wrappers right-to-left: the leftmost wrapper is the
outermost. This mirrors compose in std/llm/handlers.
let caller = compose_tool_callers([
with_audit_log({sink: "both", redact: ["token", "content"]}),
with_consent(prompt),
with_redaction(redactor),
with_required_reason({schema_required: false}).caller,
])
Captain recipe — full governance stack
The persona platform's captains (merge_captain, review_captain,
oncall_captain) all want the same substrate: every tool call yields
a structured audit record, destructive ops gate on consent, the loop
caps at a tool budget, and handoff payloads surface as typed records
on the receipts ledger. One stack covers all of it.
import {
compose_tool_callers,
with_audit_log,
with_consent,
with_dry_run,
with_handoff_artifact,
with_idempotency,
with_rate_limit,
with_redaction,
with_required_reason,
with_summary,
with_telemetry,
} from "std/llm/tool_middleware"
let reason_mw = with_required_reason({schema_required: false})
let captain_tool_caller = compose_tool_callers([
with_audit_log({sink: "both", redact: ["token", "content"]}), // typed tool receipts
with_telemetry({sink: "langfuse", project: "harn-dev"}), // tool-call spans
with_summary({ call, _r -> describe(call) }), // user-facing one-liner
with_consent(persona.autonomy_policy), // act_with_approval gate
reason_mw.caller, // require `reason` arg
with_redaction(unified_redaction_policy), // strip secrets
with_handoff_artifact({sink: handoff_emitter}), // typed handoff records
with_idempotency(per_tool_idempotency_keyer),
with_rate_limit({max_calls: persona.tool_budget}),
with_dry_run({only: persona.shadow_tools}), // crystallization shadow runs
])
let registry = tools_use_middleware(my_registry, reason_mw.schema_transform)
agent_loop(task, system, {
tools: registry,
tool_caller: captain_tool_caller,
})
Order matters: the leftmost (outermost) wrappers see every call, including those short-circuited by inner layers, so put audit and telemetry first. The required-reason / consent / redaction layers go in the middle, and the rate-limit / dry-run gates are innermost so the audit log sees what the runtime actually attempted.
Gotchas
- Closures capture by value. Don't try to share a free-form dict
across calls of a stateful middleware — the captured reference is
frozen. Use
atomic(0)for integer counters orstd/cachefor richer state. See the existingstd/llm/handlers::with_budgetfor the standard pattern. - Short-circuiting must produce a complete result dict. The
downstream
agent_session_record_tool_resultsexpects the standard shape (tool_name,okorsuccessorstatus,observationorrendered_resultoroutputorcontent). Use__tool_mw_short_circuitpatterns or the bundled middleware as a template. - Parallel dispatch ordering.
agent_loop({max_concurrent_tools: N})can run sibling tool calls from one planner turn concurrently, with or without middleware. The transcript still records tool results in planner-emitted order, and audited receipts carryemit_orderfor consumers that store events in completion order. - Schema decorators should be additive.
tool_inject_paramleaves an existing parameter untouched if it's already declared so layered middleware (e.g. multiple injects of the same field) is idempotent. - The
tool_call_auditAgentEvent is fired only when middleware setsresult.audit. No middleware → no event. This keeps the wire stream clean for hosts that don't subscribe. with_required_reason({strip: true})+ schema_transform + agent_loop. The runtime'svalidate_tool_argsruns after the middleware stripsreason, so combining the schema decorator (which marksreasonrequired) withstrip: truewill reject every call with a "missing required parameter: reason" error. Either:- Use
strip: truewithoutschema_transform— the model is told aboutreasonvia the system prompt or live-call instructions, and the natural tool schemas don't list it (the middleware strips it before the handler runs). This is the pattern used by the bundled conformance tests. - Or use
schema_transformwithstrip: false—reasonflows through to the handler, which is responsible for ignoring it.
- Use
Wire format
Each middleware-attached audit blob is also emitted as a
tool_call_audit AgentEvent so live ACP/A2A consumers can render
chips alongside the standard tool_call_update stream:
{
"type": "tool_call_audit",
"session_id": "…",
"tool_call_id": "…",
"tool_name": "search_files",
"audit": {
"summary": "Searched codebase to find rate limiter",
"kind": "search",
"consent": {"decision": "approved", "decided_by": "auto"},
"layers": [
{"name": "with_required_reason", "status": "ok"},
{"name": "with_consent", "status": "approved"}
]
}
}
See also
docs/src/stdlib/llm-handlers.md— the parallel seam at the model boundary.docs/llm/harn-quickref.md"Composable tool middleware" section — the autoloaded one-pager.crates/harn-stdlib/src/stdlib/llm/tool_middleware.harn— the source, with full per-function comments.conformance/tests/scenarios/tool_middleware_*.harn— executable examples covering the primitives,with_required_reason,with_consent,with_dry_run, and the agent_loop integration.