Merge captain persona

The Merge Captain persona is a Harn-native runbook for owning pull-request queues across multiple repositories. It is the recommended starting point for teams that want a deterministic, receipt-emitting merge workflow rather than a shell-driven sweep script.

The full persona package lives at personas/merge_captain/ and ships with policies for harn plus example service and app repos.

What it owns

A canonical 12-state machine over each tracked PR (discovered, draft, waiting_checks, behind, dirty, queued, merge_group_running, failing_ci, local_repair, blocked, merged, closed).
Per-repo policy: merge method, merge-queue toggle, required checks, required review count, blocking labels, optional ready label, per-repo local_verification commands, and downstream bump/release ordering.
Durable per-PR checkpoints written through std/agent_state so the sweep loop survives cancellation and process restart, and so two sweeps cannot race — the second writer hits the conflict_policy: "error" guard.
One signed merge receipt per (sweep, repo, PR) capturing the classification, the action chosen, the evidence the classifier saw, the approval state under the per-repo autopilot policy, the commands actually executed, the checks observed, and the final outcome. A sweep-level summary aggregates by state, action, and repo.
A typed GitHub adapter that dispatches through std/connectors/github (listed in docs/src/modules.md) in live mode and through a deterministic JSON snapshot in fixture mode. Tests, evals, and replay all run on fixtures — no gh shelling.

Running the sweep

# Default fixture sweep — useful for evals and CI.
harn run personas/merge_captain/manifest.harn

# Persona inspection.
harn persona --manifest personas/merge_captain/harn.toml \
  inspect merge_captain --json

# Smoke eval.
harn eval personas/merge_captain/evals/merge_captain_smoke.json
harn eval personas/merge_captain/harn.eval.toml

# Timeout/budget ladder with machine-readable output.
harn merge-captain ladder personas/merge_captain/harn.eval.toml \
  --report-out .harn-runs/merge-captain-ladder/report.json \
  --format json

# Unit tests for every layer.
harn test personas/merge_captain/tests/states_test.harn
harn test personas/merge_captain/tests/policy_test.harn
harn test personas/merge_captain/tests/classifier_test.harn
harn test personas/merge_captain/tests/receipt_test.harn
harn test personas/merge_captain/tests/scheduler_test.harn

Live mode input

manifest.harn reads its configuration via runtime_pipeline_input() (listed in docs/src/modules.md), so the same entry pipeline serves both the smoke fixture and a production sweep.

{
  "mode": "live",
  "policy_paths": [
    "personas/merge_captain/policies/harn.json",
    "personas/merge_captain/policies/example-service.json",
    "personas/merge_captain/policies/example-app.json"
  ],
  "state_root": "/var/lib/harn/merge-captain/state",
  "session_id": "production",
  "writer_id": "merge_captain@host-1",
  "dry_run": false,
  "sweep_id": "live-2026-04-29T17:30Z"
}

The github connector must be active for live mode. Local verification commands run via the host's process.exec capability; no other I/O leaves the typed connector layer.

Timeout ladders

harn.eval.toml contains a Merge Captain ladder that replays the same green-PR fixture through a Gemma value-model profile and increasing timeout/tool-call budgets. Each route/tier writes an event_log.jsonl, receipt.json, and summary.json; the aggregate report records the first tier that completed correctly and every tier that degraded or looped. The same manifest can be run with harn eval, harn test package --evals, or harn merge-captain ladder, so host surfaces consume one JSON contract.

Agent iteration

harn merge-captain iterate is the brute-force loop agents use after a prompt or Harn package change. The manifest declares scenarios and variants. Scenarios can be replay transcripts or mock-playground manifests; variants carry model route, timeout tier, package revision, and prompt-asset revision metadata. The runner writes one shareable directory with copied replay fixtures or materialized playgrounds, per-run transcripts/receipts/summaries, summary.json, and a Markdown table ranking variants by transcript-drift score and cost.

harn merge-captain iterate examples/personas/merge_captain/iterations/smoke.toml
harn merge-captain iterate --diff previous-iteration candidate-iteration

Authoring per-repo policy

Policies are JSON files under personas/merge_captain/policies/. Only repo is required; everything else inherits from policy.defaults() in lib/policy.harn.

{
  "repo": "burin-labs/harn",
  "default_branch": "main",
  "merge_method": "squash",
  "merge_queue_enabled": true,
  "required_checks": ["ci / rust", "ci / portal", "ci / conformance"],
  "required_review_count": 1,
  "blocking_labels": ["do-not-merge", "needs-revision", "release-block"],
  "ready_label": null,
  "max_in_flight": 8,
  "local_verification": [
    {"name": "make-test", "command": "make test"},
    {"name": "make-conformance", "command": "make conformance"}
  ],
  "downstream": [{"repo": "example-org/example-app", "action": "fetch_harn_bump"}],
  "bump_after_merge": null,
  "autopilot_states": ["queued", "waiting_checks", "behind"],
  "require_human_for": ["blocked", "local_repair"]
}

autopilot_states lists states whose mutating actions can fire without a human approval. require_human_for lists states that always escalate, even if their action is technically non-mutating. Both lists are evaluated against the per-PR classification, and the result lands in the receipt's approval_state field (autopilot, dry_run, needs_human, or no_mutation).

Receipt shape

{
  "_type": "merge_receipt",
  "version": 1,
  "persona": "merge_captain",
  "sweep_id": "...",
  "observed_at": 1761729600000,
  "repo": "burin-labs/harn",
  "pr_number": 1010,
  "head_sha": "...",
  "prior_state": "discovered",
  "classification": {
    "state": "queued",
    "action": "enqueue_merge_queue",
    "reason": "all gates passed; enqueueing on the merge queue"
  },
  "evidence": {
    "classifier": {"merge_method": "squash"},
    "snapshot": {"approvals": 1, "labels": [], "...": "..."}
  },
  "approval_state": "autopilot",
  "commands_run": [{"action": "enqueue_merge_queue", "response": {...}}],
  "checks_observed": [{"name": "ci / rust", "status": "completed", "conclusion": "success"}],
  "final_outcome": "enqueued",
  "policy_summary": {"merge_method": "squash", "merge_queue_enabled": true}
}

harn always prefers the merge queue when merge_queue_enabled is true — admin-merge bypass is never an action the classifier picks.