LLM streaming and transcripts
Streaming responses
llm_stream returns a channel that yields response chunks as they
arrive. Iterate over it with a for loop:
let stream = llm_stream("Tell me a story", "You are a storyteller")
for chunk in stream {
log(chunk)
}
llm_stream accepts the same options as llm_call (provider, model,
max_tokens). The channel closes automatically when the response is
complete.
llm_stream_call is the script-facing streaming variant of llm_call.
It returns a first-class Stream of chunk dicts instead of a channel of
raw strings:
let chunks = llm_stream_call("Tell me a story", nil, {provider: "openai"})
for chunk in chunks {
log(chunk.visible_delta)
if chunk.partial.contains("REFUSAL") {
break
}
}
Each chunk has {delta, visible_delta, partial, role, finish_reason}.
delta is the provider text delta, visible_delta and partial hide
open internal <think> blocks, and the terminal chunk carries
finish_reason when the provider reports one. Dropping the stream
aborts the background LLM request. The existing stream option on
llm_call and llm_stream_call still only controls provider transport
selection; it does not change llm_call's return type.
Partial deltas and usage
Streaming transports emit text deltas as soon as the provider sends them. Native
tool-call streams also surface partial argument deltas in agent trace events:
raw_input when the bytes parse as JSON, or raw_input_partial while the JSON
object is still incomplete.
Final token usage is recorded after the provider response completes. Read it
from the llm_call / agent_loop result, from llm_usage(), or from the
workflow session usage summary shown below.
Transcript management
Harn includes transcript primitives for carrying context across calls, forks, repairs, and resumptions:
let first = llm_call("Plan the work", nil, {provider: "mock"})
let second = llm_call("Continue", nil, {
provider: "mock",
transcript: first.transcript
})
let compacted = transcript_compact(second.transcript, {
keep_last: 4,
summary: "Planning complete."
})
Use transcript_summarize() when you want Harn to create a fresh summary with
an LLM, or transcript_compact() when you want the runtime compaction engine
outside the agent_loop path. transcript_compact() accepts the same
CompactionPolicy instruction fields as agent-loop auto-compaction, so hosts
can route /compact <instructions> through one audited path.
Transcript helpers also expose the canonical event model:
let visible = transcript_render_visible(result.transcript)
let full = transcript_render_full(result.transcript)
let events = transcript_events(result.transcript)
Use these when a host app needs to render human-visible chat separately from internal execution history.
For chat/session lifecycle, std/agents now exposes a higher-level workflow
session contract on top of raw transcripts and run records:
import "std/agents"
let result = task_run("Write a note", some_flow, {provider: "mock"})
let session = workflow_session(result)
let forked = workflow_session_fork(session)
let archived = workflow_session_archive(forked)
let resumed = workflow_session_resume(archived)
let persisted = workflow_session_persist(result, ".harn-runs/chat.json")
let restored = workflow_session_restore(persisted.run.persisted_path)
Each workflow session also carries a normalized usage summary copied from the
underlying run record when available:
log(session?.usage?.input_tokens)
log(session?.usage?.output_tokens)
log(session?.usage?.total_duration_ms)
log(session?.usage?.call_count)
std/agents also exposes worker helpers for delegated/background orchestration:
worker_request(worker), worker_result(worker), worker_provenance(worker),
worker_research_questions(worker), worker_action_items(worker),
worker_workflow_stages(worker), and worker_verification_steps(worker).
For durable persona handoff, prefer a typed artifact over copying the child or
parent transcript forward. Use handoff(...) to normalize a structured
handoff payload, handoff_artifact(...) to carry it through the workflow
artifact channel, and handoff_context(...) when a receiver needs a prompt-safe
summary of the transferred task/evidence/budget fields. The handoff artifact is
the product; the transcript stays on the source side of the boundary.
This is the intended host integration boundary:
- hosts persist chat tabs, titles, and durable asset files
- Harn persists transcript/run-record/session semantics
- hosts should prefer restoring a Harn session or transcript over inventing a parallel hidden memory format
Workflow runtime
For multi-stage orchestration, prefer the workflow runtime over product-side loop wiring. Define a helper that assembles the tools your agents will use:
fn review_tools() {
var tools = tool_registry()
tools = tool_define(tools, "read", "Read a file", {
parameters: {path: {type: "string"}},
returns: {type: "string"},
handler: nil
})
tools = tool_define(tools, "edit", "Edit a file", {
parameters: {path: {type: "string"}},
returns: {type: "string"},
handler: nil
})
tools = tool_define(tools, "run", "Run a command", {
parameters: {command: {type: "string"}},
returns: {type: "string"},
handler: nil
})
return tools
}
let graph = workflow_graph({
name: "review_and_repair",
entry: "act",
nodes: {
act: {kind: "stage", mode: "agent", tools: review_tools()},
verify: {kind: "verify", mode: "agent", tools: tool_select(review_tools(), ["run"])}
},
edges: [{from: "act", to: "verify"}]
})
let run = workflow_execute(
"Fix the failing test and verify the change.",
graph,
[],
{max_steps: 6}
)
This keeps orchestration structure, transcript policy, context policy, artifacts, and retries inside Harn instead of product code.