LLM and agents
Harn has built-in support for calling language models, streaming responses, running loop-until-done agents, and delegating work to child agents. This page is the map; the detailed references now live in focused pages.
Start here
| Topic | Use it for |
|---|---|
llm_call | Single model requests, structured JSON output, completions, budgets, and mock responses |
| LLM reranking | Pairwise candidate ranking and token-logprob self-certainty |
agent_loop | Loop-until-done agents, profiles, daemon loops, skills, and delegated workers |
| Simulated users | Agentic or scripted users for clarification-question eval harnesses |
| Tools | Typed tools, Tool Vault progressive disclosure, and MCP server tools |
| LLM ensemble helpers | Deterministic search helpers such as tree_of_thoughts(...) |
| Streaming | llm_stream, llm_stream_call, partial deltas, transcripts, workflow sessions, and token usage summaries |
| Providers | Provider setup, API details, local servers, enterprise cloud providers, and capability overrides |
Providers
Harn ships with built-in configs for Anthropic, OpenAI, OpenRouter, Ollama,
HuggingFace, Bedrock, Azure OpenAI, Vertex AI, and local OpenAI-compatible
servers. Most scripts choose a provider with the provider option, the
HARN_LLM_PROVIDER environment variable, or a model name that Harn can infer.
See LLM providers for API keys, local model setup, enterprise provider notes, Ollama runtime environment variables, and the capability matrix.
Capability matrix + harn.toml overrides
Provider capability rules, project overrides, and packaged provider adapters now live in LLM providers.
llm_call
Use llm_call(prompt, system?, options?) for a single model turn. It returns a
canonical dict with text, visible_text, model, provider, token usage,
structured data when JSON mode is enabled, tool calls, thinking blocks, and a
transcript.
let result = llm_call("Translate to French: Hello, world", nil, {
provider: "openai",
model: "gpt-4o",
max_tokens: 1024,
})
log(result.text)
For schema-validated JSON, use llm_call_structured(...) or its safe/result
envelope variants. See LLM calls for the full options and
return-value tables.
llm_call_structured
Use llm_call_structured(prompt, schema, options?) for schema-validated JSON
responses. Safe and diagnostic-envelope variants are documented in
LLM calls.
llm_completion
Use llm_completion(prefix, suffix?, system?, options?) for text continuation
and fill-in-the-middle generation. It shares provider, model, budget, and usage
semantics with llm_call.
Tool vault
Tool Vault is Harn's progressive-tool-disclosure primitive. Tools marked
defer_loading: true stay out of the prompt-visible tool surface until native
or client-executed tool_search promotes them. Use it for large tool registries
and MCP-heavy agents.
Typed tool patterns, Tool Vault options, provider support, and MCP server tool prefixing are covered in LLM tools.
agent_loop
Use agent_loop(prompt, system?, options?) when an agent should keep working
across turns. Persistent loops continue until the model emits the completion
sentinel, a budget or iteration limit is reached, daemon state idles, or a tool
policy fails.
let result = agent_loop(
"Write a function that sorts a list, then write tests for it.",
"You are a senior engineer.",
{loop_until_done: true, profile: "tool_using"}
)
log(result.status)
log(result.llm.iterations)
The result is namespaced as llm, tools, trace, task_ledger, and
transcript. Profiles preload common loop budgets for tool-using, researcher,
verifier, and completer loops. See Agent loops.
Simulated users
std/agent/user provides agentic_user(...), scripted_user(...), and
user_tools(...) for eval harnesses that need a model or fixture to answer the
agent's clarification questions. See
simulated users for eval harnesses.
Interactive chat loops
std/agent/chat provides agent_chat_loop(...) for operator-message /
model-turn harnesses. It preserves one agent_loop session across turns, routes
shared slash commands through agent_chat_route_input(...), and supports a
terminal wait_for_user tool. See
interactive chat loops.
Daemon stdlib wrappers
Use daemon_spawn, daemon_trigger, daemon_snapshot, daemon_stop, and
daemon_resume when you want first-class daemon handles instead of wiring daemon
options on agent_loop directly.
Skills lifecycle
Skills bundle metadata, a system-prompt fragment, scoped tools, and lifecycle
hooks into a typed unit. Pass a skill registry to agent_loop with the
skills: option to match, activate, scope, and optionally deactivate skills
across turns. See Agent loops.
Streaming responses
llm_stream returns a channel of raw response chunks. llm_stream_call returns
a first-class Stream of structured chunks {delta, visible_delta, partial, role, finish_reason} and cancels the background request when the stream is
dropped. Both accept the same provider, model, and generation options as
llm_call. See Streaming and transcripts.
Delegated workers
For long-running or parallel orchestration, spawn_agent, sub_agent_run,
wait_agent, send_input, resume_agent, and close_agent expose child-agent
lifecycle directly in the runtime. See
delegated workers.
Transcript management
Transcripts carry context across calls, forks, repairs, resumptions, and workflow
sessions. Use transcript_render_visible, transcript_render_full,
transcript_events, transcript_summarize, and transcript_compact when host
apps need stable rendering and replay boundaries. See
Streaming and transcripts and
Transcript projection for policies that derive
a clean model-visible prefix without destroying audit lineage.
Workflow runtime
For multi-stage orchestration, prefer workflow graphs and workflow_execute
over product-side loop wiring. This keeps orchestration structure, transcript
policy, context policy, artifacts, and retries inside Harn. See
workflow runtime notes.
Cost tracking
llm_call, agent_loop, and workflow sessions expose normalized token usage.
Use llm_cost, llm_session_cost, llm_budget, llm_budget_remaining,
tiktoken_count_tokens, std/llm/budget, and per-call budget envelopes to
estimate and enforce spend before provider requests leave the process. See
LLM calls.
Provider API details
Provider-specific endpoint, auth, readiness, and local-server notes are in LLM providers.
Testing with mock LLM responses
The mock provider and llm_mock(...) queue deterministic text, tool-call, and
error responses without API keys. See
mock LLM responses.