Preset tool hooks
preset_run_command(...) is the shipped wrapper for the
catalogue-driven "command faux-pas" library (epic
#1884). It turns a
versionable corpus of shell-mistake rules — find . -name instead of
rg --files, cargo build without --target-dir, git push --force
to main, rm -rf /, etc. — into a single (args) -> result handler
you can drop straight into agent_loop's tool registry. Rewrites are
applied with audit metadata, denies short-circuit the dispatch, and the
optional LLM classifier handles ad-hoc commands no deterministic rule
covered.
This page is the surface reference. For copy-paste recipes per language
stack see Cookbook: tool hooks. To add a
rule to the shipped catalogues see Contributing preset
hooks. The
std/tool_hooks module pairs with the lifecycle-hook
surface documented under
Hooks (tool, persona, session lifecycle);
they coexist — register_tool_hook runs at the dispatcher around every
tool, while preset_run_command is the in-tool wrapper specifically
for run_command-shaped tools that take a shell command string.
The quickref companion (LLM-friendly) lives in the "Catalogue-driven
run_command hooks" section of
docs/llm/harn-quickref.md.
Cross-references: system reminders (the
one-turn reminder injected after each rewrite), audit
receipts (the tool_rewrite / tool_denied /
tool_rule_warning lifecycle entries), and composable tool
middleware (the broader middleware story
the preset wrapper composes under).
Quick start
import { preset_run_command } from "std/tool_hooks"
pipeline default(task) {
let run_command = preset_run_command({
stacks: ["rust", "python"],
inner: { args -> shell(args.command) },
})
agent_loop(task, {tools: {tools: [{name: "run_command", handler: run_command}]}})
}
That's the whole footprint. Omit registry and the wrapper auto-seeds
from tool_hooks_seed_registry(stacks), which always includes the
universal deny catalogue plus one per opted-in stack. The default mode
(tool_hooks_mode_rewrite_with_audit) silently fixes the command and
records what it did; agents see the rewritten command in their next
turn via a one-shot tool_rewritten system reminder.
Configuration reference
preset_run_command(config) accepts a single optional dict. All keys
have defaults so preset_run_command() is valid and returns a wrapper
that runs the universal catalogue against any command before forwarding.
| Key | Default | Purpose |
|---|---|---|
stacks | [] | Opt-in stack list (["rust", "python", "typescript"|"ts", "swift", "sql", "harn"]). Drives both registry auto-seed and tool_hooks_filter. Catalogues with no stack field are always active; per-rule applies_to filters further. |
registry | tool_hooks_seed_registry(stacks) | Explicit tool_hooks_registry() value. Pass your own for tests, vendor patches, or to drop a shipped catalogue. |
custom_rules | [] | List of tool_rule(...) values matched before the registry, regardless of stack scoping. Use for harness-level overrides that must fire unconditionally. |
mode | tool_hooks_mode_rewrite_with_audit | Match-dispatch callable shaped (rule, args, inner) -> result. Three shipped modes (see below) cover the v1 contract; custom modes can call tool_hooks_emit_audit and tool_hooks_inject_reminder directly. |
inner | nil | Underlying executor shaped (args) -> result. Optional — when omitted, the wrapper returns decision envelopes so the caller drives the executor itself (useful for previews and tests). |
llm_classifier | nil | Opt-in TH-05 classifier config. See LLM classifier below. |
Recognized args shapes
The wrapper accepts whichever shape your run_command tool already
hands to its handler:
- String —
"cargo build --release". - Dict with
command/cmd/run—{command: "...", cwd: "..."}. Unknown keys are preserved on the way through toinner; rewrites swapcommandin place.
The same uniform string is what the classifier and every rule predicate sees.
Modes
Each mode is a (rule, args, inner) -> result callable. Mix and match
freely; the wrapper picks one for the whole session, but classifier
verdicts dispatch via the verdict's own mode regardless. All three
shipped modes return the same envelope shape so audit consumers can
render them uniformly.
tool_hooks_mode_rewrite_with_audit (default)
Invokes the rule's rewrite callable, forwards the rewritten command
to inner, and tags the result with audit metadata. Falls back to the
original command when the rule has no rewriter (treat that case as a
"warning that fired but had nothing to fix").
Side effects:
tool_hooks_emit_audit("tool_rewrite", payload)— lifecycle audit entry observable vialifecycle_audit_log_take().tool_hooks_inject_reminder({tags: ["tool_rewritten"], body, ttl_turns: 1})— queues a one-turnsystem_reminderso the next agent turn sees the corrected shape. When no agent session is active (headless pipelines, unit tests) the reminder still produces atool_hooks.reminder_injectedaudit entry so conformance can observe the side effect.
tool_hooks_mode_deny_with_explanation
Refuses to dispatch the command. Never invokes inner. Records a
tool_denied lifecycle audit entry and returns
{action: "deny", ...envelope} so the caller can surface a
tool_error / system_reminder to the agent. Use for irreversible or
policy-violating commands (git push --force to main, root-adjacent
rm -rf).
tool_hooks_mode_passthrough_only_audit
Runs inner with the original command unchanged and tags the result
with the matched rule so audit consumers see which rule would have
fired. Records a tool_rule_warning lifecycle audit entry. Use for
informational rules during a rollout, or for migrating from a more
permissive mode to a stricter one without breaking existing flows.
Custom modes
A mode is just a Harn closure. The wrapper passes the matched rule
record (whichever tool_hooks_match produced), the original args,
and the configured inner. Custom modes typically combine
tool_hooks_emit_audit(kind, payload) and
tool_hooks_inject_reminder({tags, body, ttl_turns, ...}) for
observability, then return an envelope shaped the same as the shipped
modes:
import { tool_hooks_emit_audit, tool_hooks_inject_reminder } from "std/tool_hooks"
let warn_then_run = { rule, args, inner ->
let cmd = type_of(args) == "string" ? args : to_string(args?.command ?? "")
tool_hooks_emit_audit("custom.tool_warn", {rule_id: rule.rule_id, command: cmd})
tool_hooks_inject_reminder({
tags: ["custom_warning"],
body: "rule " + rule.rule_id + " advisory: " + rule.explanation,
ttl_turns: 2,
})
if inner == nil { return {action: "warn", rule_id: rule.rule_id, command: cmd} }
let result = inner(args)
return {action: "warn", rule_id: rule.rule_id, command: cmd, result: result}
}
let wrapper = preset_run_command({stacks: ["rust"], mode: warn_then_run})
Decision envelope
Every shipped mode returns the same shape. Audit dashboards and
replay tooling should treat unknown action strings as advisory
extensions:
{
action: "rewrite" | "deny" | "passthrough",
command: <string>, // what would run (rewritten or original)
original_command: <string>, // what the agent submitted
rule_id: <string>,
catalogue_id: <string>,
severity: "error" | "warning" | "info",
explanation: <string>,
references: [<string>, ...], // doc / RFC / post-mortem links
result?: <inner's return>, // omitted when inner == nil or action == "deny"
}
The no-match passthrough case (no rule fired, no classifier, inner
present) is the bare return value of inner(args). When inner == nil
the wrapper returns
{action: "passthrough", command, original_command} so call-sites can
still tell "we considered it, nothing matched" apart from "we
rewrote".
Custom rules and registries
custom_rules and registry are independent surfaces. Custom rules
are matched first regardless of stack scoping, so they're the right
place for harness-level overrides:
import { preset_run_command, tool_hooks_mode_deny_with_explanation } from "std/tool_hooks"
let no_curl_pipe_sh = tool_rule({
id: "harness.curl_pipe_sh",
pattern: "curl[^|]+\\|\\s*(?:sudo\\s+)?(?:ba)?sh\\b",
applies_to: [],
severity: "error",
explanation: "Pipe-to-shell installs run unreviewed remote code. Download, inspect, then run.",
references: ["https://www.idontplaydarts.com/2016/04/detecting-curl-pipe-bash-server-side/"],
})
let wrapper = preset_run_command({
stacks: ["rust"],
custom_rules: [no_curl_pipe_sh],
mode: tool_hooks_mode_deny_with_explanation,
inner: { args -> shell(args.command) },
})
To compose a registry by hand (e.g. drop a shipped catalogue and substitute your own), build one from scratch:
import { preset_run_command, tool_hooks_catalogue_universal } from "std/tool_hooks_catalogues"
import { tool_hooks_register, tool_hooks_registry } from "std/tool_hooks"
let vendor_rust = catalogue({
id: "vendor/rust",
stack: "rust",
rules: [/* ... vendor-pinned rules ... */],
})
let registry = tool_hooks_register(
tool_hooks_register(tool_hooks_registry(), tool_hooks_catalogue_universal()),
vendor_rust,
)
let wrapper = preset_run_command({stacks: ["rust"], registry: registry})
The registered catalogues' stack field cooperates with
tool_hooks_filter: stackless catalogues survive every filter (used
for the universal deny set), and stack-tagged catalogues only fire
when the caller opts in via stacks.
LLM classifier
The optional classifier (TH-05, #1898)
catches ad-hoc commands no deterministic rule matched. It's strictly
opt-in — leaving llm_classifier: nil preserves TH-02 passthrough
semantics byte-for-byte.
let wrapper = preset_run_command({
stacks: ["rust", "python"],
llm_classifier: {
model: "haiku",
provider: "anthropic",
threshold: 0.8,
cache: {ttl_seconds: 3600},
},
inner: { args -> shell(args.command) },
})
The classifier calls a small model with a meta-prompt that asks for a
single-line JSON verdict shaped
{kind: "rewrite"|"deny"|"allow", confidence, rewritten?, explanation?, references?},
then:
confidence >= thresholdandkind == "rewrite"— dispatches viatool_hooks_mode_rewrite_with_auditwith the model'srewrittensubstitution.confidence >= thresholdandkind == "deny"— dispatches viatool_hooks_mode_deny_with_explanation.kind == "allow"or sub-threshold — passes through toinneraudited so observers can verify "we asked, the model wasn't sure".- Transport error or non-JSON response — degrades to passthrough, still audited.
Every invocation emits a tool_hook_classifier_verdict audit entry
(kind, confidence, scope, cache_hit, action, plus
normalized_command and error when present) so observers can trace
which decisions were model-driven without re-running the model.
Classifier config keys
| Key | Default | Notes |
|---|---|---|
model | (required) | Non-empty string. Pass a small/cheap model — every classifier miss is one extra llm_call_safe. |
threshold | 0.8 | Float in [0, 1]. Verdicts below this fall through to inner. |
meta_prompt | shipped default | Override only if you need to extend the verdict schema; the default is the safety reviewer prompt validated by conformance. |
provider | active default | Pin a provider explicitly to avoid silent provider-switch surprises. |
cache.ttl_ms / cache.ttl_seconds | 0 (no cache) | Cache scope is per-config (or per id if supplied) so independent wrappers don't share verdicts. |
llm_options | {} | Forwarded verbatim. Use to pin temperature, seed, retries, etc. |
Trust contract
The classifier sends the raw command text + the meta prompt to the
configured model. Callers must redact secrets in command arguments
the same way run_command already requires for the underlying shell.
Classifier calls count against the session's normal LLM cost telemetry
(peek_total_cost, autonomy budget, etc.) — keep the model small.
ToolRule schema
Every rule passes through tool_rule(config), which validates the
shape at construction time so misconfigurations fail loudly rather
than at first dispatch.
| Field | Type | Required | Purpose |
|---|---|---|---|
id | string | yes | <stack>.<tool>.<short> convention so audit consumers can group by stack and tool. Must be unique within a catalogue. |
pattern | string regex or (command, context) -> bool callable | yes | Match predicate. Regex strings are compiled lazily; callables let you express lookaround-free predicates (the Rust regex crate has no lookaround support). |
applies_to | list<string> | no | Per-rule stack scoping. Empty list = matches every stacks opt-in (use for universal rules). |
severity | "error" | "warning" | "info" | no, default "warning" | Drives audit-side rendering and the shipped modes' kind selection. |
explanation | string | no | Single-sentence rationale the agent can paraphrase back to the user. |
references | list<string> | no | Links to upstream docs, RFCs, or post-mortems. Surface in the audit envelope. |
priority | int | no, default 0 | Sort key during the linear sweep. Higher fires first. Universal deny rules use 1000; stack-specific rules use 0–100. |
rewrite | (command, context) -> string | dict | nil | no | Optional rewriter. Return nil or the same string to skip (rule still fires); return a new string to substitute. Dict form is reserved for future structured arg overrides. |
Catalogue schema
Every catalogue passes through catalogue(config):
| Field | Type | Required | Purpose |
|---|---|---|---|
id | string | yes | <source>/<stack> convention (e.g. harn-canon/rust). |
rules | list<tool_rule> | yes | Rule corpus. Validated for unique ids within the catalogue. |
stack | string | no | Opt-in scoping key. Omit for stackless universal catalogues. |
version | string | no | Semver tag for the catalogue, e.g. "0.1.0". |
source | string | no | Provenance tag. The shipped catalogues use "harn-canon". |
priority | int | no, default 0 | Sort key applied before per-rule priority. Universal catalogues use 100 so deny rules sweep first. |
Registry
tool_hooks_registry() returns an empty registry value;
tool_hooks_register(registry, catalogue) appends; tool_hooks_filter(registry, stacks)
narrows by stack list (stackless catalogues always survive);
tool_hooks_unregister(registry, catalogue_id) removes a catalogue by
id; tool_hooks_list(registry) enumerates ids; and
tool_hooks_match(registry, command, {stacks?}) is the underlying
linear-sweep matcher the wrapper uses.
Shipped catalogues
tool_hooks_seed_registry(stacks) composes the universal catalogue
plus one per opted-in stack. Unknown stacks are silently skipped so
older scripts opting into a future stack don't break.
| Catalogue | Stack | Rules |
|---|---|---|
harn-canon/universal | (stackless) | universal.git_push_force_main (deny git push --force to main / master unless HARN_TOOL_HOOKS_ALLOW_FORCE_PUSH=1); universal.rm_rf_root_adjacent (deny rm -rf against /, ~, .., $HOME, ${HOME}, *). |
harn-canon/rust | rust | target_dir_conflict (rewrite cargo build|test|check|run to add --target-dir target-shared); test_no_capture_default (append -- --nocapture to cargo test); clippy_full_workspace (append --workspace to cargo clippy); fmt_check_vs_apply (warn on bare cargo fmt). |
harn-canon/python | python | find_versus_rg (rewrite find . -name '*.py' to rg --files -t py); pip_install_no_user (warn on system-wide pip install); pytest_no_capture (append -s to bare pytest). |
harn-canon/typescript | typescript (alias ts) | find_versus_rg (rewrite find . -name '*.ts|tsx|js|jsx|mjs|cjs' to rg --files -t ts); npm_install_force_resolution (warn on npm install --force / --legacy-peer-deps). |
harn-canon/swift | swift | swift_build_clean_cache_misuse (warn on swift build --clean / swift package clean). |
harn-canon/sql | sql | select_star_warning (warn on unbounded SELECT * FROM ...). |
harn-canon/harn | harn | cargo_run_quiet (rewrite cargo run --bin harn to cargo --quiet run --bin harn). |
The full source is in
crates/harn-stdlib/src/stdlib/stdlib_tool_hooks_catalogues.harn.
Each catalogue's HarnDoc lists its rules, severity, and rationale.
Observability
Every shipped mode and the classifier emit lifecycle audit entries that flow through the standard transcript machinery:
| Kind | Emitter | Payload (subset) |
|---|---|---|
tool_rewrite | tool_hooks_mode_rewrite_with_audit | rule_id, catalogue_id, severity, original_command, command, explanation |
tool_denied | tool_hooks_mode_deny_with_explanation | rule_id, catalogue_id, severity, command, explanation |
tool_rule_warning | tool_hooks_mode_passthrough_only_audit | rule_id, catalogue_id, severity, command, explanation |
tool_hook_classifier_verdict | LLM classifier | scope, model, kind, confidence, threshold, cache_hit, action, normalized_command, error? |
tool_hooks.reminder_injected | Any reminder-injecting mode | The reminder spec passed to tool_hooks_inject_reminder |
lifecycle_audit_log_take() drains the in-memory buffer so unit tests
can assert on the exact sequence; in a live session, the events are
also written to the transcript next to hook_call / hook_returned
events from the broader hooks
surface. The tool_rewritten system
reminder threads back through the system reminders
machinery (ttl_turns: 1 by default, tags: ["tool_rewritten"]).
Composing with the broader hooks surface
preset_run_command is an in-tool wrapper specifically for
run_command-shaped tools. The general
register_tool_hook
surface fires PreToolUse / PostToolUse around every tool
dispatch in the agent loop. The two compose freely:
import { preset_run_command } from "std/tool_hooks"
pipeline default(task) {
register_tool_hook({pattern: "*", max_output: 4000})
register_tool_hook({pattern: "exec_*", deny: "exec_* is gated"})
let run_command = preset_run_command({
stacks: ["rust"],
inner: { args -> shell(args.command) },
})
agent_loop(task, {tools: {tools: [{name: "run_command", handler: run_command}]}})
}
Execution order for a run_command call here:
register_tool_hookPreToolUsefires (can deny or rewriteargs).- The handler runs — that's where
preset_run_commandmatches the command, picks a mode, optionally rewrites, and forwards toinner. register_tool_hookPostToolUsefires (can rewrite or truncate the result viamax_output).
Both layers write to the transcript, so replay tools see every decision in order.
Tracking issues
- Epic: #1884 — Preset run-command hook library.
- TH-01..TH-07 status: all shipped (see epic for individual issue numbers). TH-07 (#2022) closed out the conformance coverage; this page is TH-08 (#1901).
- Cross-references: combinators (#1860), system reminders (#1815).