Provider support recommendations

This page aggregates Harn's provider/model catalog, runtime capability rules, small curated notes, and optional harn eval coding-agent benchmark summaries. Regenerate with make gen-provider-support and verify with make check-provider-support.

No benchmark summary is baked into this checked-in page. To layer local empirical results, run harn providers support --empirical .harn-runs/coding-agent-bench/latest/summary.json.

ProviderEndpoint styleRecommended selectorTool modeNative toolsText toolsStructured outputReasoning knobsCacheUsage confidenceEmpirical
AnthropicAnthropic Messages APIhaikunativeyesyestool_use / xml_taggedenabledyeshighnot_recorded
Azure OpenaiOpenAI-compatible chat completionsazure_openai:gpt-*nativeyesyesnone / native_jsonnonenoprovider_defaultnot_recorded
BedrockAWS Bedrock Conversebedrock:anthropic.claude-*nativeyesyesnone / xml_taggednonenoprovider_defaultnot_recorded
CerebrasOpenAI-compatible chat completionscerebras/gpt-oss-120bnativenoyesnative / native_jsoneffort,reasoning_effortnohighnot_recorded
CohereOpenAI-compatible chat completionscohere:command-a-plus-05-2026nativeyesyesnative / native_jsonadaptivenohighnot_recorded
DashscopeOpenAI-compatible chat completionsdashscope:qwen3.6*nativeyesyesnative / delimiteddisable_directive:/no_think,enablednoprovider_defaultnot_recorded
DeepseekOpenAI-compatible chat completionsdeepseek:deepseek-v4-flashnativeyesyesnative / native_jsonenabledyeshighnot_recorded
FireworksOpenAI-compatible chat completionsfireworks:*qwen3.6*nativeyesyesnative / delimiteddisable_directive:/no_think,enablednoprovider_defaultnot_recorded
Gemini APIGemini generateContentgemini:gemini-2.5-flashnativeyesyesnative / native_jsonadaptive,effort,enabledyesmediumnot_recorded
GroqOpenAI-compatible chat completionsgroq:llama-3.1-8b-instantnativeyesyesnative / native_jsonnonenohighnot_recorded
HuggingfaceOpenAI-compatible chat completionshuggingface:qwen/qwen3.6*nativeyesyesnative / delimiteddisable_directive:/no_think,enablednoprovider_defaultnot_recorded
llama.cpp serverOpenAI-compatible llama-serverllamacpp-qwen3.6-q4nativeyesyesnative / native_jsondisable_directive:/no_think,enablednomediumnot_recorded
OpenAI-compatible local serverOpenAI-compatible chat completionslocal-gemma4textyesyesnative / delimitedenablednolownot_recorded
MinimaxOpenAI-compatible chat completionsminimax:MiniMax-M2.5-highspeednativeyesyesdelimited / delimitedenabledyeshighnot_recorded
Mistral via OpenRouterOpenAI-compatible chat completions through OpenRouteropenrouter:mistralai/mistral-small-2603nativeyesyesnative / native_jsonnoneyesmediumnot_recorded
MLX OpenAI-compatible serverOpenAI-compatible MLX servermlx-qwen3.6-27bnativeyesyesnative / delimiteddisable_directive:/no_think,enablednomediumnot_recorded
OllamaOllama native chat APIdevstral-small-2textnoyesformat_kw / delimitednonenohighnot_recorded
OpenAIOpenAI chat completions / Responses-compatible routesmidnativeyesyesnative / native_jsonnonenohighnot_recorded
OpenrouterOpenAI-compatible chat completionsopenrouter:google/gemini-2.5-flashnativeyesyesnative / native_jsoneffort,enabledyeshighnot_recorded
TgiOpenAI-compatible chat completionstgitextnoyesnone / nonenonenolocal_zero_costnot_recorded
TogetherOpenAI-compatible chat completionstogether:Qwen/Qwen3-Coder-Next-FP8nativeyesyesnative / delimitednonenohighnot_recorded
VertexGemini generateContentvertex:gemini-*nativeyesyesnone / native_jsonnonenoprovider_defaultnot_recorded
VllmOpenAI-compatible chat completionsvllmtextnoyesnone / nonenonenolocal_zero_costnot_recorded
XaiOpenAI-compatible chat completionsxai:grok-build-0.1nativeyesyesnative / native_jsonadaptiveyeshighnot_recorded
ZaiOpenAI-compatible chat completionszai:glm-5nativeyesyesnative / native_jsonenabledyeshighnot_recorded

Anthropic

  • catalog provider: anthropic
  • recommended route: haiku (claude-haiku-4-5-20251001)
  • endpoint style: Anthropic Messages API
  • recommended Harn options:
provider = "anthropic"
model = "haiku"
tool_format = "native"
structured_output_mode = "xml_tagged"

Notes:

  • Native tools, prompt caching, file upload, and XML-oriented scaffolding are first-class in Harn capability data.
  • Claude 4.7 rows use adaptive thinking; older Claude 4 rows use explicit thinking controls where supported.

Caveats:

  • Strict JSON output is modeled as tool-use or XML-tagged output rather than OpenAI-style response_json_schema.

MCP notes:

  • No provider-specific MCP connector is required; Harn exposes MCP tools through the runtime tool registry.

Cerebras

  • catalog provider: cerebras
  • recommended route: cerebras/gpt-oss-120b (gpt-oss-120b)
  • endpoint style: OpenAI-compatible chat completions
  • recommended Harn options:
provider = "cerebras"
model = "gpt-oss-120b"
tool_format = "native"
structured_output_mode = "native_json"

Notes:

  • Harn catalogs Cerebras public serverless rows separately from dedicated-endpoint weights so clients do not present unprovisioned enterprise endpoints as one-click routes.
  • Use the slash-prefixed selector form (cerebras/<model>) when a single string must carry both provider and model identity; Harn strips the prefix before sending the provider-native model id.

Caveats:

  • Preview rows such as zai-glm-4.7 may be discontinued by Cerebras on short notice; pin public production workloads to non-preview rows unless the caller opts into preview behavior.

MCP notes:

  • MCP tools are normalized through Harn tool definitions before they become OpenAI-compatible Cerebras tool schemas.

Gemini API

  • catalog provider: gemini
  • recommended route: gemini:gemini-2.5-flash (gemini-2.5-flash)
  • endpoint style: Gemini generateContent
  • recommended Harn options:
provider = "gemini"
model = "gemini-2.5-flash"
tool_format = "native"
structured_output_mode = "native_json"

Notes:

  • Harn lowers native tools to Gemini function declarations and maps function responses back into the transcript.
  • Gemini response usage maps cached-content token counts when the provider reports them.

Caveats:

  • Harn does not create Gemini context-cache resources yet; cache accounting is therefore observational.

MCP notes:

  • MCP tools are regular Harn runtime tools before they become Gemini function declarations.

llama.cpp server

  • catalog provider: llamacpp
  • recommended route: llamacpp-qwen3.6-q4 (qwen3.6-35b-a3b-ud-q4-k-xl)
  • endpoint style: OpenAI-compatible llama-server
  • recommended Harn options:
provider = "llamacpp"
model = "llamacpp-qwen3.6-q4"
tool_format = "native"
thinking = "off"

Notes:

  • llama.cpp gets its own provider so Harn can model Qwen chat-template and thinking behavior separately from generic local OpenAI-compatible servers.

Caveats:

  • Run both provider readiness and tool probes after changing GGUF, context, KV-cache, or chat-template settings.
  • Harn-managed llama.cpp launch plus one-tool probe passed native and streaming native on 2026-06-05.

Local setup:

  • Run harn models install local-qwen3.6-gguf for the recommended download and launch commands.

OpenAI-compatible local server

  • catalog provider: local
  • recommended route: local-gemma4 (gemma-4-26b-a4b-it)
  • endpoint style: OpenAI-compatible chat completions
  • recommended Harn options:
provider = "local"
model = "local-gemma4"
tool_format = "text"

Notes:

  • Use this generic provider when a local server speaks OpenAI chat completions but does not need a provider-specific quirk profile.

Caveats:

  • Prefer llamacpp or mlx when those runtimes are known, because their capability rows can encode template-specific behavior.

MCP notes:

  • MCP tools are exposed as OpenAI-compatible tool definitions unless the route is configured to prefer Harn text tools.

Local setup:

  • Set LOCAL_LLM_BASE_URL and either LOCAL_LLM_MODEL or an explicit Harn model selector, then run harn provider-ready local.

Mistral via OpenRouter

  • catalog provider: openrouter
  • recommended route: openrouter:mistralai/mistral-small-2603 (mistralai/mistral-small-2603)
  • endpoint style: OpenAI-compatible chat completions through OpenRouter
  • recommended Harn options:
provider = "openrouter"
model = "mistralai/mistral-small-2603"
tool_format = "native"

Notes:

  • Harn catalogs hosted Mistral routes through OpenRouter today, so endpoint and auth behavior are OpenAI-compatible.
  • Use this row for Mistral-family recommendation surfaces until a direct Mistral provider is cataloged.

Caveats:

  • Provider-native behavior depends on the OpenRouter model route; run the coding-agent benchmark before promoting it to a default for critical harnesses.

MCP notes:

  • MCP tools are rendered as OpenAI-compatible tool definitions on this route.

MLX OpenAI-compatible server

  • catalog provider: mlx
  • recommended route: mlx-qwen3.6-27b (unsloth/Qwen3.6-27B-UD-MLX-4bit)
  • endpoint style: OpenAI-compatible MLX server
  • recommended Harn options:
provider = "mlx"
model = "mlx-qwen3.6-27b"
tool_format = "native"

Notes:

  • MLX routes use native tools only after the served identity and tool probe match the cataloged model.

Caveats:

  • mlx-vlm server flags vary by release; launch first, then verify with harn provider-ready mlx.

Local setup:

  • Run harn models install local-qwen3.6-27b for the venv, download, launch, and verification commands.

Ollama

  • catalog provider: ollama
  • recommended route: devstral-small-2 (devstral-small-2:24b)
  • endpoint style: Ollama native chat API
  • recommended Harn options:
provider = "ollama"
model = "devstral-small-2"
tool_format = "text"
thinking = "off"

Notes:

  • Local Ollama model quality varies by template and quantization; Harn defaults known fragile routes to the text-tool contract.
  • Use harn provider-tool-probe receipts to promote aliases from unknown to native/text/disabled on a machine.

Caveats:

  • Some Ollama native tool parsers reject otherwise valid text-mode model output; the capability table records those routes as text-only.

MCP notes:

  • MCP tools are local Harn tools; prefer the text tool contract unless a probe proves native calls work for the installed model.

Local setup:

  • Run harn models install devstral-small-2, then verify with harn provider-ready ollama --model <model>. For local qwen3.x, use the llamacpp provider (e.g. local-qwen3.6) — Ollama's qwen3.5-family tool-call parser 500s on text-tool output.

OpenAI

  • catalog provider: openai
  • recommended route: mid (gpt-4o-mini)
  • endpoint style: OpenAI chat completions / Responses-compatible routes
  • recommended Harn options:
provider = "openai"
model = "mid"
tool_format = "native"
structured_output_mode = "native_json"

Notes:

  • OpenAI-family routes default to native tool calls and native JSON structured output when the model row supports tools.
  • Reasoning models use developer-role instructions and reasoning-summary transcript projection where the capability row declares it.

Caveats:

  • Use explicit reasoning effort only on reasoning rows; non-reasoning chat models should keep thinking disabled.

MCP notes:

  • Hosted MCP behavior is normalized through Harn tool definitions; provider-side hosted tools remain a separate provider feature.