LLM tools

Typed tools for agent loops

agent_loop(...) does not need a bespoke host tool for every deterministic operation. The fastest path is usually to wrap pure stdlib logic in a typed tool and let the model call that tool directly.

This keeps the tool contract explicit:

inputs are typed in the tool schema
outputs are structured and replayable
the implementation stays deterministic because it is ordinary Harn stdlib

Pattern

Build a registry with tool_define(...), give each tool a precise input and output shape, and keep the handler body purely stdlib. For repeated declarative specs, import tool_define_many(...) or tool_registry_from(...) from std/tools. Import the producer-owned ToolRegistry alias when a package accepts or forwards registries through a typed public API — do not widen to dict / any or duplicate the {_type: "tool_registry", tools: ...} shape.

import { AgentLoopOptions, agent_options } from "std/agent/options"
import { ToolRegistry } from "std/tools"

/** Package export: accept and return the stdlib registry type. */
pub fn with_search_tool(registry: ToolRegistry) -> ToolRegistry {
  return tool_define(registry, "search", "Search the corpus", {
    parameters: {query: {type: "string"}},
    handler: { args -> "hits for " + args.query },
  })
}

fn agent_opts_from_package() -> AgentLoopOptions {
  const tools: ToolRegistry = with_search_tool(tool_registry())
  return agent_options({provider: "mock", tools: tools})
}

import "std/vision"

fn deterministic_tools() {
  let tools = tool_registry()

  tools = tool_define(tools, "math::calc", "Deterministic arithmetic", {
    parameters: {
      lhs: {type: "number"},
      rhs: {type: "number"},
      op: {type: "string", enum: ["add", "sub", "mul", "div"]},
    },
    returns: {type: "number"},
    handler: { args ->
      if args.op == "add" { return args.lhs + args.rhs }
      if args.op == "sub" { return args.lhs - args.rhs }
      if args.op == "mul" { return args.lhs * args.rhs }
      if args.op == "div" { return args.lhs / args.rhs }
      throw "unsupported op"
    },
  })

  tools = tool_define(tools, "regex::match", "Regex search over text", {
    parameters: {
      pattern: {type: "string"},
      text: {type: "string"},
    },
    returns: {type: "array", items: {type: "string"}},
    handler: { args -> return regex_match(args.pattern, args.text) ?? [] },
  })

  tools = tool_define(tools, "strings::count_char", "Count a single character", {
    parameters: {
      text: {type: "string"},
      char: {type: "string", minLength: 1, maxLength: 1},
    },
    returns: {type: "integer"},
    handler: { args ->
      require len(args.char) == 1, "char must be exactly one character"
      return split(args.text, args.char).count() - 1
    },
  })

  tools = tool_define(tools, "crypto::sha256", "Hash text as lowercase hex", {
    parameters: {
      text: {type: "string"},
    },
    returns: {type: "string"},
    handler: { args -> return sha256(args.text) },
  })

  tools = tool_define(tools, "vision::ocr", "Read text from an image", {
    parameters: {
      image: {
        description: "Path string or image dict accepted by std/vision.ocr",
      },
      options: {
        type: "object",
        properties: {
          language: {type: "string"},
        },
      },
    },
    returns: {
      type: "object",
      properties: {
        _type: {type: "string"},
        text: {type: "string"},
        blocks: {type: "array"},
        lines: {type: "array"},
        tokens: {type: "array"},
        source: {type: "object"},
        backend: {type: "object"},
        stats: {type: "object"},
      },
    },
    handler: { args -> return ocr(args.image, args.options) },
  })

  return tools
}

Structured producer facts

Tool handlers normally return the text shown to the model. If middleware or another lifecycle consumer also needs machine-readable facts, return an explicit handler-result envelope:

import { agent_tool_handler_result } from "std/agent/tool_lifecycle"

fn main() {
  const handler = { _args ->
    return agent_tool_handler_result(
      "Edited src/main.harn",
      {mutation_status: "applied", diagnostics_error_count: 0},
    )
  }
}

The dispatcher keeps the complete envelope on result. It uses only text for rendered_result and the model-visible observation, so changing the prose does not change the structured facts. Unmarked dict returns retain their historical display-string behavior.

Then hand the registry to agent_loop(...):

const result = agent_loop(
  "Read the screenshot, hash the extracted order id, and summarize the UI state.",
  "Use deterministic tools first. Prefer pure stdlib tools over free-form reasoning when possible.",
  {
    loop_until_done: true,
    tools: deterministic_tools(),
    max_iterations: 12,
  }
)

log(result.text)

Why this works

math::calc, regex::match, strings::count_char, and crypto::sha256 stay fully deterministic because they are just stdlib code.
vision::ocr returns StructuredText with token, line, and block structure so the model can branch on fields in the tool result.
The current OCR backend shells out to tesseract, but the runtime keeps the backend pluggable, runs it through the process sandbox under an active handler policy, and records OCR metadata plus structured output on audit.vision_ocr when an event log is active.

Guidance

Reach for typed stdlib tools before inventing a new MCP server or host bridge surface.
Keep tool names product-facing and stable even if the handler body is simple.
Return structured dicts or lists with concrete schemas so the model (and later pipeline steps) can branch on fields instead of scraping prose strings.

Host-backed tool helpers

Harnesses that need filesystem, git, search, or command execution tools can use the packaged hostlib wrappers instead of hand-rolling tool registries. Import std/agent/host_tools and choose the narrowest helper:

import { agent_command_tools, agent_host_tools } from "std/agent/host_tools"

const tools = agent_command_tools(nil, {
  cwd: repo_root,
  max_inline_bytes: 12000,
  command_behavior: {
    background_after_ms: 750,
    progress_interval_ms: 3000,
    progress_max_inline_bytes: 1600,
  },
  allow_argv_prefixes: [
    ["git", "status"],
    ["git", "log"],
    ["git", "diff"],
    ["sed", "-n"],
  ],
})

agent_loop(task, system, {
  tools: tools,
  tool_format: "native",
  require_successful_tools: ["run_command", ["read_command_output", "read_command_output_tail"]],
})

For git-only workflows, prefer std/git.git_tools(...) when the model should see individual local git operations instead of a generic command runner or the coarse git_inspect operation switch:

import { git_tools } from "std/git"

const tools = git_tools(nil, {
  repo: repo_root,
  enabled_tools: ["git_status", "git_log", "git_switch"],
  names: {git_log: "release_git_log"},
})

git_tools(...) also accepts defer_loading, namespace, and tool_config, so a large git surface can use the same Tool Vault path as any other registry. When small/local models struggle to choose among many individual git schemas, use the compact toolbox instead:

const tools = git_toolbox_tools(nil, {
  repo: repo_root,
  include_mutations: true,
})

That registry exposes find_git_tool and run_git_tool. The finder uses a deterministic stdlib lexical scorer over a curated git-operation catalog; callers can still plug richer search globally through tool_search.strategy when they want BM25, hybrid ranking, embeddings, or an LLM reranker for deferred tools.

agent_command_tools(...) installs:

run_command — argv-first process execution through hostlib_tools_run_command; can return an immediate progress snapshot while the command continues in the background. Foreground commands are tied to the invoking scope: on scope cancellation, deadline expiry, or VM drop the child's whole process group is SIGTERMed and, after a short grace, SIGKILLed (Unix; best-effort direct-child kill on Windows). Use background: true for work that must outlive the scope — background handles are reaped only by cancel_handle or session-end cleanup (see Long-running tools)
read_command_output — range-read stdout, stderr, or combined artifacts by command_id, handle_id, or artifact path
read_command_output_tail — read the last bytes of command output without requiring the model to calculate offsets

When run_command runs a producer | tail/wc/grep/sort/… shell pipeline (a shell-mode command, or argv: ["sh", "-c", …]), the trailing consume-all filter is applied by the shell before Harn captures any output, so the producer's full output would normally be lost — recovering it means re-running a potentially slow command. The command runner detects this shape and transparently rewrites it to producer | tee '<capture>' 2>/dev/null | filter: the model still sees exactly the filtered output it asked for, while the producer's complete output is preserved to a temp file. A post-run output_capture field then points the model at that file (read it with read_command_output/read_file instead of re-running). The rewrite is deliberately conservative — it leaves early-terminating filters (head, grep -q/-m) and anything it cannot parse safely (command substitution, subshells, backgrounded jobs, …) untouched — and is on by default; set preserve_filtered_output: false to disable it.

agent_read_tools(...) installs root-scoped read_file, read_file_tail, list_directory, get_file_outline, search_files, and read-only git_inspect. agent_host_tools(...) composes both groups.

Default mutation tools

agent_edit_tools(...) ships the canonical write/edit/delete toolset every embedder otherwise hand-rolls, wrapping the same hostlib filesystem primitives with the same root resolution and path-scope enforcement as agent_read_tools:

write_file(path, content) — create or overwrite a UTF-8 file, creating missing parent directories.
edit_file(path, old_string, new_string, replace_all?) — exact-string replacement. Errors if old_string is absent, or if it occurs more than once unless replace_all is true, so the model never silently edits the wrong span (edit_file composes read → replace → write in Harn — there is no hostlib string-replacement builtin).
create_directory(path) — mkdir -p; succeeds if the directory exists.
delete_path(path, recursive?) — refuses a non-empty directory unless recursive is true; deleting a missing path is a no-op.

Mutation is opt-in: agent_edit_tools(...) is not folded into agent_host_tools(...). Compose it explicitly over the read/command surface when a loop should be allowed to change the workspace:

import { agent_edit_tools, agent_host_tools } from "std/agent/host_tools"

const tools = agent_edit_tools(agent_host_tools(nil, {root: repo_root}), {root: repo_root})

Every edit tool is annotated honestly as mutating (kind: edit/delete, side_effect_level: workspace_write), so the read-only stance (agent_loop({read_only_stance: {...}})) hides them until the loop earns write access, and tool-surface narrowing classifies them correctly. Because they are plain named tools, the std/llm/tool_middleware seams (with_consent, with_audit_log, with_dry_run, with_idempotency, …) wrap them without any per-tool support:

import { agent_edit_tools } from "std/agent/host_tools"
import { compose_tool_callers, default_tool_caller, with_audit_log, with_consent } from "std/llm/tool_middleware"

agent_loop(task, system, {
  tools: agent_edit_tools(nil, {root: repo_root}),
  tool_format: "native",
  tool_caller: compose_tool_callers([
    with_consent(host_approval_fn),
    with_audit_log({sink: "local"}),
    default_tool_caller(),
  ]),
})

agent_edit_tools(...) accepts the same root/cwd, names, descriptions, enabled_tools/disabled_tools (group key edit), annotations, tool_config, namespace, defer_loading, and output_format options as the other host-tool helpers.

Use std/command for deterministic script-side harness work, such as "run this named step, retry according to this policy, keep a normalized step record, and hand a compact failure reference to a recovery agent." Use std/agent/host_tools.agent_command_tools(...) when the command runner itself should be exposed as model-facing tools inside an agent loop. Both layers share hostlib_tools_run_command and hostlib_tools_read_command_output, so command artifacts, IDs, output paths, and range readers behave the same.

For JSON-emitting CLIs, keep parsing in the script-side command layer instead of open-coding process execution plus json_parse:

import { command_json } from "std/command"

const repo = command_json(["gh", "api", "repos/burin-labs/harn"], {
  capture: {max_inline_bytes: 65536},
})

The helpers are deliberately configurable so harness authors can keep their script surface product-specific without duplicating implementation details:

const tools = agent_host_tools(nil, {
  root: repo_root,
  cwd: repo_root,
  max_inline_bytes: 12000,
  search_max_matches: 50,
  search_max_line_bytes: 1024,
  exclude_globs: [".harn-runs/**", "logs/**"],
  names: {
    run_command: "release_run",
    read_command_output_tail: "release_log_tail",
  },
  descriptions: {
    run_command: "Run one bounded release inspection command.",
  },
  enabled_tools: ["read", "command"],
  disabled_tools: ["git_inspect"],
  annotations: {
    run_command: {workflow_phase: "release_audit"},
  },
  namespace: "release",
  defer_loading: true,
  output_format: "json",
  command_policy: { request, _args ->
    if request.mode == "argv" && request.argv[0] == "git" {
      return true
    }
    return "only git inspection commands are allowed in this harness"
  },
})

Useful options:

Option	Meaning
`root`, `cwd`	Scope read/search/git paths plus command working directories. Out-of-root absolute command `cwd` or git `repo` paths become rejected tool results instead of being executed.
`names`	Rename logical tools while preserving correct result-reader links.
`descriptions`	Override model-facing descriptions per logical tool.
`enabled_tools`, `disabled_tools`	Include logical keys (`run_command`) or groups (`read`, `command`).
`max_inline_bytes`	Cap inline command output. If the model asks for a larger inline capture, the helper clamps it to this ceiling so full output still flows through command-output artifact readers.
`command_behavior` / `run_command_behavior`	Defaults for every generated `run_command` request. `background_after_ms` gives the command a short foreground startup window, then returns `{status: "running", stdout, stderr, output_path, ...}` while progress events continue in the session. `progress_interval_ms` seeds the feedback cadence: it is the delay before the first progress snapshot and then doubles after each one up to an internal ceiling (default 30000ms), so a long-running command emits frequent early progress that thins out over time. `progress_max_inline_bytes` caps snippets.
`search_max_matches` / `max_search_matches`	Default `search_files.max_matches` when the model omits one; callers can still request a different cap per call.
`search_max_line_bytes` / `max_search_line_bytes`	Default `search_files.max_line_bytes` when the model omits one; long matched lines are clipped around the hit so generated/minified files cannot dominate the next turn.
`exclude_globs` / `search_exclude_globs`	Baseline exclusions for `search_files` using root-relative globs. Tool-call `exclude_globs` are merged with these defaults, not substituted for them.
`allow_argv_prefixes`	Deny `run_command` calls unless `argv` starts with a listed string or list prefix.
`preserve_filtered_output`	Default `true`. For a `producer \| tail/wc/grep/…` pipeline, tee the producer's full output to a temp file so the model can read it via `output_capture` instead of re-running a slow command. Conservative (skips `head`, `grep -q`/`-m`, command substitution, subshells, …). Set `false` to disable.
`command_policy` / `allow_command`	Closure hook for custom command approval, denial, or rewrite.
`annotations`	Merge extra annotations into generated tool definitions.
`tool_config`	Advanced raw `tool_define(...)` config overrides per logical tool or `"*"`.
`namespace`, `defer_loading`	Apply Tool Vault metadata to generated tools.
`output_format`, `result_formatter`	Return JSON strings by default, or customize rendering.

Tool vault

Harn's Tool Vault is the progressive-tool-disclosure primitive: tool definitions that stay out of the model's context until they're surfaced by a search call. This keeps context cheap for agents with hundreds of tools (coding agents, MCP-heavy setups) without requiring the integrator to hand-filter tools per turn.

Per-tool flag: `defer_loading`

Any tool registered via tool_define (or the tool { … } language form) can opt out of eager loading:

let registry = tool_registry()
registry = tool_define(registry, "deploy", "Deploy to production", {
  parameters: {env: {type: "string", enum: ["staging", "prod"]}},
  defer_loading: true,
  handler: { args -> exec("deploy", args.env) },
})

Deferred tools never appear in the model's context unless a tool-search call surfaces them. They are sent to the provider (so prompt caching stays warm on Anthropic — the schemas live in the API prefix but not the model's context).

Call-level option: `tool_search`

Turning progressive disclosure on is one option away:

const r = llm_call(prompt, sys, {
  provider: "anthropic",
  model: "claude-opus-4-7",
  tools: registry,
  tool_search: "bm25",
})

Accepted shapes:

Shape	Meaning
`tool_search: true`	Default: `bm25` variant, mode `auto`.
`tool_search: "bm25"`	Natural-language queries.
`tool_search: "regex"`	Python-regex queries.
`tool_search: "hybrid"`	Client-mode BM25 plus field-weighted ranking.
`tool_search: false`	Explicit off (same as omitting).
`tool_search: {variant, mode, strategy, always_loaded, budget_tokens, name, include_stub_listing}`	Explicit dict form.

mode options:

"auto" (default) — use native if the provider supports it, otherwise fall back to the client-executed path (no error).
"native" — force the provider's native mechanism. Errors if unsupported.
"client" — force the client-executed path even on providers with native support. Useful for A/B-ing strategies or pinning behavior across heterogeneous provider fleets.

Provider support

Provider	Native `tool_search`	Variants / modes
Anthropic Claude Opus/Sonnet 4.0+, Haiku 4.5+	✓	`bm25`, `regex`
Anthropic 3.x or earlier 4.x Haiku	✗ (uses client fallback)	—
OpenAI Responses API — GPT 5.4+	✓	`hosted` (default), `client`
OpenAI pre-5.4 (`gpt-4o`, `gpt-4.1`, …)	✗	client fallback works today
OpenRouter / Together / Groq / DeepSeek / Fireworks / HuggingFace / local	✓ when routed model matches `gpt-5.4+` upstream	hosted forwarded; escape hatch below for proxies
Gemini, Ollama, mock (default model)	✗	client fallback works today

The OpenAI native path emits a flat {"type": "tool_search", "mode": "hosted"} meta-tool at the front of the tools array, alongside defer_loading: true on the wrapper of each user tool. The server runs the search and replies with tool_search_call / tool_search_output entries that Harn parses into the same transcript event shape as the Anthropic path (replays are indistinguishable across providers).

Namespace grouping

OpenAI's tool_search can group deferred tools into namespaces; pass namespace: "<label>" on tool_define(...) to tag a tool. Harn collects the distinct set into the meta-tool's namespaces field. Anthropic ignores the label — harmless passthrough for replay fidelity.

tool_define(registry, "deploy_api", "Deploy the API", {
  parameters: {env: {type: "string", enum: ["staging", "prod"]}},
  defer_loading: true,
  namespace: "ops",
  handler: { args -> exec("deploy", "api", args.env) },
})

Escape hatch for proxied OpenAI-compat endpoints

Self-hosted routers and enterprise gateways sometimes advertise a model ID Harn cannot parse (my-internal-gpt-clone-v2) yet forward the OpenAI Responses payload unchanged. Opt into the hosted path with:

llm_call(prompt, sys, {
  provider: "openrouter",
  model: "my-custom/gpt-forward",
  tools: registry,
  tool_search: {mode: "native"},
  openrouter: {force_native_tool_search: true},
})

The override is keyed by the provider name (the same dict you'd use for any provider-specific knob).

For the authoritative capability table, harn.toml override schema, and packaged provider adapter config, see LLM providers.

Client-executed fallback

On providers without native defer_loading, Harn falls back to an in-VM execution path. The fallback is identical to the native path from a script's point of view: same option surface, same transcript events, same promotion behavior across turns. Internally, Harn injects a synthetic tool called __harn_tool_search — when the model calls it, the loop runs the configured strategy against the deferred-tool index, promotes the matching tools into the next turn's schema list, and emits the same tool_search_query / tool_search_result transcript events as native mode (tagged mode: "client" in metadata so replays can distinguish paths).

Strategies (client mode only):

`strategy`	Runs in	Notes
`"bm25"` (default)	VM	Tokenized BM25 over `name + description + param text`. Matches `open_file` from query `open file`.
`"regex"`	VM	Case-insensitive Rust-regex over the same corpus. No backreferences, no lookaround.
`"hybrid"`	VM	Reciprocal-rank fusion over BM25 and field-weighted name/description/parameter matches. Useful for coding-agent tools where exact names like `run_command` should beat broad prose matches.
closure	Harn	Called as `strategy(query, deferred_tools, state)`; return a list of tool names, a JSON string, or `{tool_names: [...]}`.
`{handler: closure, name?: string}`	Harn	Named custom searcher with the same return shapes as a closure. Use this to wire embeddings, LLM rerankers, trigram indexes, project-specific ontologies, or host calls from ordinary Harn code.

Extra client-mode knobs:

budget_tokens: N — soft cap on the total token footprint of promoted tool schemas. Oldest-first eviction when exceeded. Omit to keep every promoted schema for the life of the call.
name: "find_tool" — override the synthetic tool's name. Handy when a skill's vocabulary suggests a more natural verb (discover, lookup, …).
always_loaded: ["read_file", "run"] — pin tool names to the eager set even if defer_loading: true is set on their registry entries.
include_stub_listing: true — append a short list of deferred tool names + one-line descriptions to the tool-contract prompt so the model can eyeball what's available without a search call. Off by default to match Anthropic's native ergonomic.

Pre-flight validation

At least one user tool must be non-deferred. Harn errors before the API call is made, matching Anthropic's documented 400.
defer_loading must be a bool — typos like defer_loading: "yes" error at tool_define time rather than silently falling back to the "no defer" default.

Transcript events

Every native tool-search round-trip emits structured events in the run record:

tool_search_query — the search tool's invocation (input query, search-tool id).
tool_search_result — the references returned by the server (which deferred tools got promoted on this turn).

These are stable shapes; replay / eval can reconstruct which tools were available when without re-running the call.

Provider-hosted tools

Use normal Harn tools when Harn should execute the operation locally, including local MCP servers, approval hooks, mutation-session audit, and deterministic tool receipts. Use OpenAI Responses provider_tools only when the provider should execute a hosted tool or remote MCP connector:

const result = llm_call("Find the current policy and summarize it.", nil, {
  provider: "openai",
  model: "gpt-5.4",
  api_mode: "responses",
  provider_tools: [
    {type: "web_search"},
    {type: "mcp", server_label: "docs", server_url: "https://mcp.example.com", require_approval: "always"},
  ],
})

Provider-hosted calls are not dispatchable Harn tool calls. Harn records them as provider_tool_call blocks with executor: "provider_native", provider_tool_id, call_id, provider_tool_type, tool_kind, and the raw provider_metadata. Remote MCP approval and execution are mediated by OpenAI according to the provider tool config; choose Harn MCP tools instead when Harn must own each approval and tool receipt.

Standalone OpenAI Responses compaction uses the same mode with compact: true. Harn posts the request to /responses/compact and records returned opaque compaction items as private blocks rather than rewriting the Harn transcript implicitly.

MCP server tools

Use mcp_servers when an agent should use an MCP server's tool catalog without manually calling mcp_connect, mcp_list_tools, and mcp_call.

const result = agent_loop(
  "Summarize the latest open issue and draft a reply.",
  "You are a concise triage assistant.",
  {
    provider: "openai",
    model: "gpt-5.4",
    mcp_servers: [
      {name: "github", transport: "http", url: "http://localhost:3030/mcp"},
      {name: "local_fs", transport: "stdio", command: ["mcp-filesystem", "/tmp/project"]},
    ],
    max_iterations: 8,
  },
)

Discovered tools are always prefixed with the server name, for example github__search_issues or local_fs__read_file. The prefix makes collisions deterministic when two servers both export a tool named search or when a server tool would otherwise overlap a local Harn tool. The actual MCP tools/call request still uses the original unprefixed MCP tool name.