Provider support recommendations
This page aggregates Harn's provider/model catalog, runtime capability rules, small curated notes, and optional harn eval coding-agent benchmark summaries. Regenerate with make gen-provider-support and verify with make check-provider-support.
No benchmark summary is baked into this checked-in page. To layer local empirical results, run harn providers support --empirical .harn-runs/coding-agent-bench/latest/summary.json.
| Provider | Endpoint style | Recommended selector | Tool mode | Native tools | Text tools | Structured output | Reasoning knobs | Cache | Usage confidence | Empirical |
|---|---|---|---|---|---|---|---|---|---|---|
Anthropic | Anthropic Messages API | haiku | native | yes | yes | tool_use / xml_tagged | enabled | yes | high | not_recorded |
Azure Openai | OpenAI-compatible chat completions | azure_openai:gpt-* | native | yes | yes | none / native_json | none | no | provider_default | not_recorded |
Bedrock | AWS Bedrock Converse | bedrock:anthropic.claude-* | native | yes | yes | none / xml_tagged | none | no | provider_default | not_recorded |
Cerebras | OpenAI-compatible chat completions | cerebras/gpt-oss-120b | native | no | yes | native / native_json | effort,reasoning_effort | no | high | not_recorded |
Cohere | OpenAI-compatible chat completions | cohere:command-a-plus-05-2026 | native | yes | yes | native / native_json | adaptive | no | high | not_recorded |
Dashscope | OpenAI-compatible chat completions | dashscope:qwen3.6* | native | yes | yes | native / delimited | disable_directive:/no_think,enabled | no | provider_default | not_recorded |
Deepseek | OpenAI-compatible chat completions | deepseek:deepseek-v4-flash | native | yes | yes | native / native_json | enabled | yes | high | not_recorded |
Fireworks | OpenAI-compatible chat completions | fireworks:*qwen3.6* | native | yes | yes | native / delimited | disable_directive:/no_think,enabled | no | provider_default | not_recorded |
Gemini API | Gemini generateContent | gemini:gemini-2.5-flash | native | yes | yes | native / native_json | adaptive,effort,enabled | yes | medium | not_recorded |
Groq | OpenAI-compatible chat completions | groq:llama-3.1-8b-instant | native | yes | yes | native / native_json | none | no | high | not_recorded |
Huggingface | OpenAI-compatible chat completions | huggingface:qwen/qwen3.6* | native | yes | yes | native / delimited | disable_directive:/no_think,enabled | no | provider_default | not_recorded |
llama.cpp server | OpenAI-compatible llama-server | llamacpp-qwen3.6-q4 | native | yes | yes | native / native_json | disable_directive:/no_think,enabled | no | medium | not_recorded |
OpenAI-compatible local server | OpenAI-compatible chat completions | local-gemma4 | text | yes | yes | native / delimited | enabled | no | low | not_recorded |
Minimax | OpenAI-compatible chat completions | minimax:MiniMax-M2.5-highspeed | native | yes | yes | delimited / delimited | enabled | yes | high | not_recorded |
Mistral via OpenRouter | OpenAI-compatible chat completions through OpenRouter | openrouter:mistralai/mistral-small-2603 | native | yes | yes | native / native_json | none | yes | medium | not_recorded |
MLX OpenAI-compatible server | OpenAI-compatible MLX server | mlx-qwen3.6-27b | native | yes | yes | native / delimited | disable_directive:/no_think,enabled | no | medium | not_recorded |
Ollama | Ollama native chat API | devstral-small-2 | text | no | yes | format_kw / delimited | none | no | high | not_recorded |
OpenAI | OpenAI chat completions / Responses-compatible routes | mid | native | yes | yes | native / native_json | none | no | high | not_recorded |
Openrouter | OpenAI-compatible chat completions | openrouter:google/gemini-2.5-flash | native | yes | yes | native / native_json | effort,enabled | yes | high | not_recorded |
Tgi | OpenAI-compatible chat completions | tgi | text | no | yes | none / none | none | no | local_zero_cost | not_recorded |
Together | OpenAI-compatible chat completions | together:Qwen/Qwen3-Coder-Next-FP8 | native | yes | yes | native / delimited | none | no | high | not_recorded |
Vertex | Gemini generateContent | vertex:gemini-* | native | yes | yes | none / native_json | none | no | provider_default | not_recorded |
Vllm | OpenAI-compatible chat completions | vllm | text | no | yes | none / none | none | no | local_zero_cost | not_recorded |
Xai | OpenAI-compatible chat completions | xai:grok-build-0.1 | native | yes | yes | native / native_json | adaptive | yes | high | not_recorded |
Zai | OpenAI-compatible chat completions | zai:glm-5 | native | yes | yes | native / native_json | enabled | yes | high | not_recorded |
Recommended options
Anthropic
- catalog provider:
anthropic - recommended route:
haiku(claude-haiku-4-5-20251001) - endpoint style: Anthropic Messages API
- recommended Harn options:
provider = "anthropic"
model = "haiku"
tool_format = "native"
structured_output_mode = "xml_tagged"
Notes:
- Native tools, prompt caching, file upload, and XML-oriented scaffolding are first-class in Harn capability data.
- Claude 4.7 rows use adaptive thinking; older Claude 4 rows use explicit thinking controls where supported.
Caveats:
- Strict JSON output is modeled as tool-use or XML-tagged output rather than OpenAI-style response_json_schema.
MCP notes:
- No provider-specific MCP connector is required; Harn exposes MCP tools through the runtime tool registry.
Cerebras
- catalog provider:
cerebras - recommended route:
cerebras/gpt-oss-120b(gpt-oss-120b) - endpoint style: OpenAI-compatible chat completions
- recommended Harn options:
provider = "cerebras"
model = "gpt-oss-120b"
tool_format = "native"
structured_output_mode = "native_json"
Notes:
- Harn catalogs Cerebras public serverless rows separately from dedicated-endpoint weights so clients do not present unprovisioned enterprise endpoints as one-click routes.
- Use the slash-prefixed selector form (
cerebras/<model>) when a single string must carry both provider and model identity; Harn strips the prefix before sending the provider-native model id.
Caveats:
- Preview rows such as
zai-glm-4.7may be discontinued by Cerebras on short notice; pin public production workloads to non-preview rows unless the caller opts into preview behavior.
MCP notes:
- MCP tools are normalized through Harn tool definitions before they become OpenAI-compatible Cerebras tool schemas.
Gemini API
- catalog provider:
gemini - recommended route:
gemini:gemini-2.5-flash(gemini-2.5-flash) - endpoint style: Gemini generateContent
- recommended Harn options:
provider = "gemini"
model = "gemini-2.5-flash"
tool_format = "native"
structured_output_mode = "native_json"
Notes:
- Harn lowers native tools to Gemini function declarations and maps function responses back into the transcript.
- Gemini response usage maps cached-content token counts when the provider reports them.
Caveats:
- Harn does not create Gemini context-cache resources yet; cache accounting is therefore observational.
MCP notes:
- MCP tools are regular Harn runtime tools before they become Gemini function declarations.
llama.cpp server
- catalog provider:
llamacpp - recommended route:
llamacpp-qwen3.6-q4(qwen3.6-35b-a3b-ud-q4-k-xl) - endpoint style: OpenAI-compatible llama-server
- recommended Harn options:
provider = "llamacpp"
model = "llamacpp-qwen3.6-q4"
tool_format = "native"
thinking = "off"
Notes:
- llama.cpp gets its own provider so Harn can model Qwen chat-template and thinking behavior separately from generic local OpenAI-compatible servers.
Caveats:
- Run both provider readiness and tool probes after changing GGUF, context, KV-cache, or chat-template settings.
- Harn-managed llama.cpp launch plus one-tool probe passed native and streaming native on 2026-06-05.
Local setup:
- Run
harn models install local-qwen3.6-gguffor the recommended download and launch commands.
OpenAI-compatible local server
- catalog provider:
local - recommended route:
local-gemma4(gemma-4-26b-a4b-it) - endpoint style: OpenAI-compatible chat completions
- recommended Harn options:
provider = "local"
model = "local-gemma4"
tool_format = "text"
Notes:
- Use this generic provider when a local server speaks OpenAI chat completions but does not need a provider-specific quirk profile.
Caveats:
- Prefer
llamacppormlxwhen those runtimes are known, because their capability rows can encode template-specific behavior.
MCP notes:
- MCP tools are exposed as OpenAI-compatible tool definitions unless the route is configured to prefer Harn text tools.
Local setup:
- Set
LOCAL_LLM_BASE_URLand eitherLOCAL_LLM_MODELor an explicit Harn model selector, then runharn provider-ready local.
Mistral via OpenRouter
- catalog provider:
openrouter - recommended route:
openrouter:mistralai/mistral-small-2603(mistralai/mistral-small-2603) - endpoint style: OpenAI-compatible chat completions through OpenRouter
- recommended Harn options:
provider = "openrouter"
model = "mistralai/mistral-small-2603"
tool_format = "native"
Notes:
- Harn catalogs hosted Mistral routes through OpenRouter today, so endpoint and auth behavior are OpenAI-compatible.
- Use this row for Mistral-family recommendation surfaces until a direct Mistral provider is cataloged.
Caveats:
- Provider-native behavior depends on the OpenRouter model route; run the coding-agent benchmark before promoting it to a default for critical harnesses.
MCP notes:
- MCP tools are rendered as OpenAI-compatible tool definitions on this route.
MLX OpenAI-compatible server
- catalog provider:
mlx - recommended route:
mlx-qwen3.6-27b(unsloth/Qwen3.6-27B-UD-MLX-4bit) - endpoint style: OpenAI-compatible MLX server
- recommended Harn options:
provider = "mlx"
model = "mlx-qwen3.6-27b"
tool_format = "native"
Notes:
- MLX routes use native tools only after the served identity and tool probe match the cataloged model.
Caveats:
mlx-vlmserver flags vary by release; launch first, then verify withharn provider-ready mlx.
Local setup:
- Run
harn models install local-qwen3.6-27bfor the venv, download, launch, and verification commands.
Ollama
- catalog provider:
ollama - recommended route:
devstral-small-2(devstral-small-2:24b) - endpoint style: Ollama native chat API
- recommended Harn options:
provider = "ollama"
model = "devstral-small-2"
tool_format = "text"
thinking = "off"
Notes:
- Local Ollama model quality varies by template and quantization; Harn defaults known fragile routes to the text-tool contract.
- Use
harn provider-tool-probereceipts to promote aliases from unknown to native/text/disabled on a machine.
Caveats:
- Some Ollama native tool parsers reject otherwise valid text-mode model output; the capability table records those routes as text-only.
MCP notes:
- MCP tools are local Harn tools; prefer the text tool contract unless a probe proves native calls work for the installed model.
Local setup:
- Run
harn models install devstral-small-2, then verify withharn provider-ready ollama --model <model>. For local qwen3.x, use the llamacpp provider (e.g.local-qwen3.6) — Ollama's qwen3.5-family tool-call parser 500s on text-tool output.
OpenAI
- catalog provider:
openai - recommended route:
mid(gpt-4o-mini) - endpoint style: OpenAI chat completions / Responses-compatible routes
- recommended Harn options:
provider = "openai"
model = "mid"
tool_format = "native"
structured_output_mode = "native_json"
Notes:
- OpenAI-family routes default to native tool calls and native JSON structured output when the model row supports tools.
- Reasoning models use developer-role instructions and reasoning-summary transcript projection where the capability row declares it.
Caveats:
- Use explicit reasoning effort only on reasoning rows; non-reasoning chat models should keep thinking disabled.
MCP notes:
- Hosted MCP behavior is normalized through Harn tool definitions; provider-side hosted tools remain a separate provider feature.