LLM and agents

Harn has built-in support for calling language models, streaming responses, running loop-until-done agents, and delegating work to child agents. This page is the map; the detailed references now live in focused pages.

Start here

Topic	Use it for
`llm_call`	Single model requests, structured JSON output, completions, budgets, and mock responses
LLM reranking	Pairwise candidate ranking and token-logprob self-certainty
`agent_loop`	Loop-until-done agents, profiles, daemon loops, skills, and delegated workers
Simulated users	Agentic or scripted users for clarification-question eval harnesses
Tools	Typed tools, Tool Vault progressive disclosure, and MCP server tools
LLM ensemble helpers	Deterministic search helpers such as `tree_of_thoughts(...)`
Streaming	`llm_stream`, `llm_stream_call`, partial deltas, transcripts, workflow sessions, and token usage summaries
Providers	Provider setup, API details, local servers, enterprise cloud providers, and capability overrides

Providers

Harn ships with built-in configs for Anthropic, OpenAI, OpenRouter, Ollama, HuggingFace, Bedrock, Azure OpenAI, Vertex AI, and local OpenAI-compatible servers. Most scripts choose a provider with the provider option, the HARN_LLM_PROVIDER environment variable, or a model name that Harn can infer.

See LLM providers for API keys, local model setup, enterprise provider notes, Ollama runtime environment variables, and the capability matrix.

Capability matrix + `harn.toml` overrides

Provider capability rules, project overrides, and packaged provider adapters now live in LLM providers.

llm_call

Use llm_call(prompt, system?, options?) for a single model turn. It returns a canonical dict with text, visible_text, model, provider, token usage, structured data when JSON mode is enabled, tool calls, thinking blocks, and a transcript.

let result = llm_call("Translate to French: Hello, world", nil, {
  provider: "openai",
  model: "gpt-4o",
  max_tokens: 1024,
})
log(result.text)

For schema-validated JSON, use llm_call_structured(...) or its safe/result envelope variants. See LLM calls for the full options and return-value tables.

llm_call_structured

Use llm_call_structured(prompt, schema, options?) for schema-validated JSON responses. Safe and diagnostic-envelope variants are documented in LLM calls.

llm_completion

Use llm_completion(prefix, suffix?, system?, options?) for text continuation and fill-in-the-middle generation. It shares provider, model, budget, and usage semantics with llm_call.

Tool vault

Tool Vault is Harn's progressive-tool-disclosure primitive. Tools marked defer_loading: true stay out of the prompt-visible tool surface until native or client-executed tool_search promotes them. Use it for large tool registries and MCP-heavy agents.

Typed tool patterns, Tool Vault options, provider support, and MCP server tool prefixing are covered in LLM tools.

agent_loop

Use agent_loop(prompt, system?, options?) when an agent should keep working across turns. Persistent loops continue until the model emits the completion sentinel, a budget or iteration limit is reached, daemon state idles, or a tool policy fails.

let result = agent_loop(
  "Write a function that sorts a list, then write tests for it.",
  "You are a senior engineer.",
  {loop_until_done: true, profile: "tool_using"}
)
log(result.status)
log(result.llm.iterations)

The result is namespaced as llm, tools, trace, task_ledger, and transcript. Profiles preload common loop budgets for tool-using, researcher, verifier, and completer loops. See Agent loops.

Simulated users

std/agent/user provides agentic_user(...), scripted_user(...), and user_tools(...) for eval harnesses that need a model or fixture to answer the agent's clarification questions. See simulated users for eval harnesses.

Interactive chat loops

std/agent/chat provides agent_chat_loop(...) for operator-message / model-turn harnesses. It preserves one agent_loop session across turns, routes shared slash commands through agent_chat_route_input(...), and supports a terminal wait_for_user tool. See interactive chat loops.

Daemon stdlib wrappers

Use daemon_spawn, daemon_trigger, daemon_snapshot, daemon_stop, and daemon_resume when you want first-class daemon handles instead of wiring daemon options on agent_loop directly.

Skills lifecycle

Skills bundle metadata, a system-prompt fragment, scoped tools, and lifecycle hooks into a typed unit. Pass a skill registry to agent_loop with the skills: option to match, activate, scope, and optionally deactivate skills across turns. See Agent loops.

Streaming responses

llm_stream returns a channel of raw response chunks. llm_stream_call returns a first-class Stream of structured chunks {delta, visible_delta, partial, role, finish_reason} and cancels the background request when the stream is dropped. Both accept the same provider, model, and generation options as llm_call. See Streaming and transcripts.

Delegated workers

For long-running or parallel orchestration, spawn_agent, sub_agent_run, wait_agent, send_input, resume_agent, and close_agent expose child-agent lifecycle directly in the runtime. See delegated workers.

Transcript management

Transcripts carry context across calls, forks, repairs, resumptions, and workflow sessions. Use transcript_render_visible, transcript_render_full, transcript_events, transcript_summarize, and transcript_compact when host apps need stable rendering and replay boundaries. See Streaming and transcripts and Transcript projection for policies that derive a clean model-visible prefix without destroying audit lineage.

Workflow runtime

For multi-stage orchestration, prefer workflow graphs and workflow_execute over product-side loop wiring. This keeps orchestration structure, transcript policy, context policy, artifacts, and retries inside Harn. See workflow runtime notes.

Cost tracking

llm_call, agent_loop, and workflow sessions expose normalized token usage. Use llm_cost, llm_session_cost, llm_budget, llm_budget_remaining, tiktoken_count_tokens, std/llm/budget, and per-call budget envelopes to estimate and enforce spend before provider requests leave the process. See LLM calls.

Provider API details

Provider-specific endpoint, auth, readiness, and local-server notes are in LLM providers.

Testing with mock LLM responses

The mock provider and llm_mock(...) queue deterministic text, tool-call, and error responses without API keys. See mock LLM responses.