Provider support recommendations

This page aggregates Harn's provider/model catalog, runtime capability rules, small curated notes, and optional harn eval coding-agent benchmark summaries. Regenerate with make gen-provider-support and verify with make check-provider-support.

No benchmark summary is baked into this checked-in page. To layer local empirical results, run harn provider catalog support --empirical .harn-runs/coding-agent-bench/latest/summary.json.

Provider	Endpoint style	Recommended selector	Tool mode	Native tools	Text tools	Structured output	Reasoning knobs	Cache	Batch	Serving tiers	Usage confidence	Empirical
`Anthropic`	Anthropic Messages API	`haiku`	`native`	yes	yes	`tool_use` / `xml_tagged`	`enabled`	yes	Yes (50%)	`fast:premium`	`high`	`not_recorded`
`Atlas`	OpenAI-compatible chat completions	`atlas`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Azure Openai`	OpenAI-compatible chat completions	`azure_openai:gpt-*`	`native`	yes	yes	`none` / `native_json`	none	no	Yes (50%)	none	`provider_default`	`not_recorded`
`Baseten`	OpenAI-compatible chat completions	`baseten:baseten/openai/gpt-oss-120b`	`native`	yes	yes	`native` / `native_json`	`effort,reasoning_effort`	yes	No	none	`high`	`not_recorded`
`Bedrock`	AWS Bedrock Converse	`bedrock:anthropic.claude-*`	`native`	yes	yes	`none` / `xml_tagged`	none	no	Yes	none	`provider_default`	`not_recorded`
`Cerebras`	OpenAI-compatible chat completions	`cerebras/gpt-oss-120b`	`native`	yes	yes	`native` / `native_json`	`effort,reasoning_effort`	no	No	none	`high`	`not_recorded`
`Cloudflare Ai Gateway`	OpenAI-compatible chat completions	`cloudflare_ai_gateway`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Cohere`	OpenAI-compatible chat completions	`cohere:command-a-plus-05-2026`	`native`	yes	yes	`native` / `native_json`	`adaptive`	no	No	none	`high`	`not_recorded`
`Dashscope`	OpenAI-compatible chat completions	`dashscope:dashscope/qwen3-coder-next`	`native`	yes	yes	`native` / `delimited`	`disable_directive:/no_think,enabled`	yes	No	none	`high`	`not_recorded`
`Deepinfra`	OpenAI-compatible chat completions	`deepinfra:deepinfra/Qwen/Qwen3-235B-A22B-Instruct-2507`	`native`	yes	yes	`native` / `native_json`	none	no	No	none	`high`	`not_recorded`
`Deepseek`	OpenAI-compatible chat completions	`deepseek:deepseek-v4-flash`	`native`	yes	yes	`native` / `native_json`	`enabled`	yes	No	none	`high`	`not_recorded`
`Fireworks`	OpenAI-compatible chat completions	`fireworks:accounts/fireworks/models/gpt-oss-120b`	`text`	no	yes	`none` / `native_json`	`effort,reasoning_effort`	no	Yes (50%)	none	`high`	`not_recorded`
`Flexai`	OpenAI-compatible chat completions	`flexai`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Friendli`	OpenAI-compatible chat completions	`friendli`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Gemini API`	Gemini generateContent	`gemini:gemini-2.5-flash`	`native`	yes	yes	`native` / `native_json`	`adaptive,effort,enabled`	yes	Yes (50%)	`flex:discounted`, `priority:premium`	`medium`	`not_recorded`
`Github Models`	OpenAI-compatible chat completions	`github_models`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Groq`	OpenAI-compatible chat completions	`groq:llama-3.1-8b-instant`	`native`	yes	yes	`native` / `native_json`	none	no	Yes (50%)	none	`high`	`not_recorded`
`Hugging Face Inference Providers`	OpenAI-compatible chat completions through the HF router	`huggingface-qwen3-coder`	`native`	yes	yes	`native` / `delimited`	none	no	No	none	`medium`	`not_recorded`
`Hunyuan`	OpenAI-compatible chat completions	`hunyuan`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Hyperbolic`	OpenAI-compatible chat completions	`hyperbolic`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Inception`	OpenAI-compatible chat completions	`inception:mercury-2`	`native`	yes	yes	`native` / `native_json`	`effort,reasoning_effort`	no	No	none	`high`	`not_recorded`
`llama.cpp server`	OpenAI-compatible llama-server	`llamacpp-qwen3.6-q4`	`native`	no	yes	`native` / `delimited`	`disable_directive:/no_think,enabled`	no	No	none	`medium`	`not_recorded`
`OpenAI-compatible local server`	OpenAI-compatible chat completions	`local-gemma4`	`text`	yes	yes	`native` / `delimited`	`enabled`	no	No	none	`low`	`not_recorded`
`Minimax`	OpenAI-compatible chat completions	`minimax:MiniMax-M2.5-highspeed`	`native`	yes	yes	`delimited` / `delimited`	`enabled`	yes	No	none	`high`	`not_recorded`
`Mistral via OpenRouter`	OpenAI-compatible chat completions through OpenRouter	`openrouter:mistralai/mistral-small-2603`	`native`	yes	yes	`native` / `native_json`	none	yes	No	none	`medium`	`not_recorded`
`MLX OpenAI-compatible server`	OpenAI-compatible MLX server	`mlx-qwen3.6`	`native`	yes	yes	`native` / `delimited`	`disable_directive:/no_think,enabled`	no	No	none	`medium`	`not_recorded`
`Moonshot`	OpenAI-compatible chat completions	`moonshot:moonshot/kimi-k2.6`	`native`	yes	yes	`native` / `native_json`	`enabled`	yes	No	none	`high`	`not_recorded`
`Nebius`	OpenAI-compatible chat completions	`nebius`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Nvidia`	OpenAI-compatible chat completions	`nvidia:nvidia/minimax-m2.7`	`native`	yes	yes	`delimited` / `delimited`	`enabled`	yes	No	none	`high`	`not_recorded`
`Ollama`	Ollama native chat API	`devstral-small-2`	`text`	no	yes	`format_kw` / `delimited`	none	no	No	none	`high`	`not_recorded`
`OpenAI`	OpenAI chat completions / Responses-compatible routes	`openai:gpt-5.4-mini`	`native`	yes	yes	`native` / `native_json`	`effort,reasoning_effort,reasoning_none`	yes	Yes (50%)	`fast:premium`, `priority:premium`, `flex:discounted`	`high`	`not_recorded`
`Openrouter`	OpenAI-compatible chat completions	`openrouter:google/gemini-2.5-flash`	`native`	yes	yes	`native` / `native_json`	`effort,enabled`	yes	No	none	`high`	`not_recorded`
`Parasail`	OpenAI-compatible chat completions	`parasail`	`text`	no	yes	`none` / `none`	none	no	Yes	none	`provider_default`	`not_recorded`
`Qianfan`	OpenAI-compatible chat completions	`qianfan`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Sambanova`	OpenAI-compatible chat completions	`sambanova:sambanova/gpt-oss-120b`	`text`	no	yes	`native` / `native_json`	`effort,reasoning_effort`	no	No	none	`high`	`not_recorded`
`Siliconflow`	OpenAI-compatible chat completions	`siliconflow`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Tgi`	OpenAI-compatible chat completions	`tgi`	`text`	no	yes	`none` / `none`	none	no	No	none	`local_zero_cost`	`not_recorded`
`Together`	OpenAI-compatible chat completions	`together:openai/gpt-oss-20b`	`native`	yes	yes	`native` / `native_json`	`effort,reasoning_effort`	no	Yes (50%)	none	`high`	`not_recorded`
`Vercel AI Gateway`	OpenAI-compatible chat completions	`vercel_ai_gateway:vercel/openai/gpt-5.4-nano`	`native`	yes	yes	`native` / `native_json`	`effort,reasoning_effort,reasoning_none`	yes	No	none	`high`	`not_recorded`
`Vertex`	Gemini generateContent	`vertex:gemini-*`	`native`	yes	yes	`none` / `native_json`	none	no	No	none	`provider_default`	`not_recorded`
`Vllm`	OpenAI-compatible chat completions	`vllm`	`text`	no	yes	`none` / `none`	none	no	No	none	`local_zero_cost`	`not_recorded`
`Volcengine Ark`	OpenAI-compatible chat completions	`volcengine_ark`	`text`	no	yes	`none` / `none`	none	no	No	none	`provider_default`	`not_recorded`
`Xai`	OpenAI-compatible chat completions	`xai:grok-build-0.1`	`native`	yes	yes	`native` / `native_json`	`adaptive`	yes	Yes	none	`high`	`not_recorded`
`Zai`	OpenAI-compatible chat completions	`zai:glm-4.6`	`text`	yes	yes	`native` / `native_json`	`enabled`	yes	No	none	`high`	`not_recorded`

Recommended options

Anthropic

catalog provider: anthropic
recommended route: haiku (claude-haiku-4-5-20251001)
endpoint style: Anthropic Messages API
recommended Harn options:

provider = "anthropic"
model = "haiku"
tool_format = "native"
structured_output_mode = "xml_tagged"

Notes:

Native tools, prompt caching, file upload, and XML-oriented scaffolding are first-class in Harn capability data.
Claude 4.7 rows use adaptive thinking; older Claude 4 rows use explicit thinking controls where supported.

Caveats:

Strict JSON output is modeled as tool-use or XML-tagged output rather than OpenAI-style response_json_schema.

MCP notes:

No provider-specific MCP connector is required; Harn exposes MCP tools through the runtime tool registry.

Cerebras

catalog provider: cerebras
recommended route: cerebras/gpt-oss-120b (gpt-oss-120b)
endpoint style: OpenAI-compatible chat completions
recommended Harn options:

provider = "cerebras"
model = "gpt-oss-120b"
tool_format = "native"
structured_output_mode = "native_json"

Notes:

Harn catalogs Cerebras public serverless rows separately from dedicated-endpoint weights so clients do not present unprovisioned enterprise endpoints as one-click routes.
Use the slash-prefixed selector form (cerebras/<model>) when a single string must carry both provider and model identity; Harn strips the prefix before sending the provider-native model id.

Caveats:

Preview rows such as zai-glm-4.7 may be discontinued by Cerebras on short notice; pin public production workloads to non-preview rows unless the caller opts into preview behavior.

MCP notes:

MCP tools are normalized through Harn tool definitions before they become OpenAI-compatible Cerebras tool schemas.

Gemini API

catalog provider: gemini
recommended route: gemini:gemini-2.5-flash (gemini-2.5-flash)
endpoint style: Gemini generateContent
recommended Harn options:

provider = "gemini"
model = "gemini-2.5-flash"
tool_format = "native"
structured_output_mode = "native_json"

Notes:

Harn lowers native tools to Gemini function declarations and maps function responses back into the transcript.
Gemini response usage maps cached-content token counts when the provider reports them.

Caveats:

Harn does not create Gemini context-cache resources yet; cache accounting is therefore observational.

MCP notes:

MCP tools are regular Harn runtime tools before they become Gemini function declarations.

Hugging Face Inference Providers

catalog provider: huggingface
recommended route: huggingface-qwen3-coder (Qwen/Qwen3-Coder-480B-A35B-Instruct)
endpoint style: OpenAI-compatible chat completions through the HF router
recommended Harn options:

provider = "huggingface"
model = "huggingface-qwen3-coder"
tool_format = "native"

Notes:

The Hugging Face router uses the OpenAI-compatible chat completions API and supports tools and streaming for chat-completion models.
Qwen3-Coder 480B A35B is the recommended HF router coding row because its model card publishes native 262K context, agentic coding focus, and a function-call-oriented format.

Caveats:

Router availability, latency, and pricing depend on the selected upstream provider; run readiness and tool probes before promoting a route into production defaults.
Do not infer reasoning controls from other Qwen rows: this Qwen3-Coder model card says the model is non-thinking.

MCP notes:

MCP tools are rendered as OpenAI-compatible tool definitions on this route.

Inception

catalog provider: inception
recommended route: inception:mercury-2 (mercury-2)
endpoint style: OpenAI-compatible chat completions
recommended Harn options:

provider = "inception"
model = "mercury-2"
tool_format = "native"
structured_output_mode = "native_json"

Caveats:

Inception documents OpenAI-compatible tool use for Mercury 2; no Harn parity probe has run yet.

llama.cpp server

catalog provider: llamacpp
recommended route: llamacpp-qwen3.6-q4 (qwen3.6-35b-a3b-ud-q4-k-xl)
endpoint style: OpenAI-compatible llama-server
recommended Harn options:

provider = "llamacpp"
model = "llamacpp-qwen3.6-q4"
tool_format = "native"
thinking = "off"

Notes:

llama.cpp gets its own provider so Harn can model Qwen chat-template and thinking behavior separately from generic local OpenAI-compatible servers.

Caveats:

Run both provider readiness and tool probes after changing GGUF, context, KV-cache, or chat-template settings.
2026-07-19 #5162 CUDA receipt: freshly restarted ghcr.io/ggml-org/llama.cpp:server-cuda servers used --jinja, --reasoning-format deepseek, --chat-template-kwargs {"enable_thinking":false}, -np 1, -fa on, and q8_0 K/V cache. Six coding-agent fixtures were forced through native and JSON text formats with two replicates per format on Q4_K_XL (native 2/12, text 8/12; -ngl 999, -c 65536), Q5_K_XL (native 0/12, text 8/12; -ngl 999, -c 65536), and Q8_0 base (native 2/12, text 8/12; -ngl 35, -c 16384 because the 36.9 GB model does not fit this 32 GiB GPU). All rows reached high classifier confidence; see .harn-runs/coding-agent-bench/latest/tool_mode_parity_overlay.toml and the per-quant receipts. Native remained unreliable across the family, so keep JSON text tooling; server_parser = none.

Local setup:

Run harn models install local-qwen3.6-gguf for the recommended download and launch commands.

OpenAI-compatible local server

catalog provider: local
recommended route: local-gemma4 (gemma-4-26b-a4b-it)
endpoint style: OpenAI-compatible chat completions
recommended Harn options:

provider = "local"
model = "local-gemma4"
tool_format = "text"

Notes:

Use this generic provider when a local server speaks OpenAI chat completions but does not need a provider-specific quirk profile.

Caveats:

Prefer llamacpp or mlx when those runtimes are known, because their capability rows can encode template-specific behavior.

MCP notes:

MCP tools are exposed as OpenAI-compatible tool definitions unless the route is configured to prefer Harn text tools.

Local setup:

Set LOCAL_LLM_BASE_URL and either LOCAL_LLM_MODEL or an explicit Harn model selector, then run harn provider ready local.

Mistral via OpenRouter

catalog provider: openrouter
recommended route: openrouter:mistralai/mistral-small-2603 (mistralai/mistral-small-2603)
endpoint style: OpenAI-compatible chat completions through OpenRouter
recommended Harn options:

provider = "openrouter"
model = "mistralai/mistral-small-2603"
tool_format = "native"

Notes:

Harn catalogs hosted Mistral routes through OpenRouter today, so endpoint and auth behavior are OpenAI-compatible.
Use this row for Mistral-family recommendation surfaces until a direct Mistral provider is cataloged.

Caveats:

Provider-native behavior depends on the OpenRouter model route; run the coding-agent benchmark before promoting it to a default for critical harnesses.

MCP notes:

MCP tools are rendered as OpenAI-compatible tool definitions on this route.

MLX OpenAI-compatible server

catalog provider: mlx
recommended route: mlx-qwen3.6 (unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit)
endpoint style: OpenAI-compatible MLX server
recommended Harn options:

provider = "mlx"
model = "mlx-qwen3.6"
tool_format = "native"

Notes:

MLX routes use native tools only after the served identity and tool probe match the cataloged model.

Caveats:

mlx_lm.server flags vary by release; launch first, then verify with harn provider ready mlx.

Local setup:

Run harn models install mlx-qwen3.6 for the venv, download, launch, and verification commands.

Ollama

catalog provider: ollama
recommended route: devstral-small-2 (devstral-small-2:24b)
endpoint style: Ollama native chat API
recommended Harn options:

provider = "ollama"
model = "devstral-small-2"
tool_format = "text"
thinking = "off"

Notes:

Local Ollama model quality varies by template and quantization; Harn defaults known fragile routes to the text-tool contract.
Use harn provider tool-probe receipts to promote aliases from unknown to native/text/disabled on a machine.

Caveats:

Some Ollama native tool parsers reject otherwise valid text-mode model output; the capability table records those routes as text-only.

MCP notes:

MCP tools are local Harn tools; prefer the text tool contract unless a probe proves native calls work for the installed model.

Local setup:

Run harn models install devstral-small-2, then verify with harn provider ready ollama --model <model>. For local qwen3.x, use the llamacpp provider (e.g. local-qwen3.6) — Ollama's qwen3.5-family tool-call parser 500s on text-tool output.

OpenAI

catalog provider: openai
recommended route: openai:gpt-5.4-mini (gpt-5.4-mini)
endpoint style: OpenAI chat completions / Responses-compatible routes
recommended Harn options:

provider = "openai"
model = "gpt-5.4-mini"
tool_format = "native"
structured_output_mode = "native_json"

Notes:

OpenAI-family routes default to native tool calls and native JSON structured output when the model row supports tools.
Reasoning models use developer-role instructions and reasoning-summary transcript projection where the capability row declares it.

Caveats:

Use explicit reasoning effort only on reasoning rows; non-reasoning chat models should keep thinking disabled.

MCP notes:

Hosted MCP behavior is normalized through Harn tool definitions; provider-side hosted tools remain a separate provider feature.

Sambanova

catalog provider: sambanova
recommended route: sambanova:sambanova/gpt-oss-120b (sambanova/gpt-oss-120b)
endpoint style: OpenAI-compatible chat completions
recommended Harn options:

provider = "sambanova"
model = "sambanova/gpt-oss-120b"
tool_format = "text"
structured_output_mode = "native_json"

Caveats:

2026-06-24 Harn agent-loop (gpt-oss-120b, zig-feat, tool grounding present): SambaNova native ended with a provider/tool-protocol failure (Harmony empty tool_calls / reasoning-channel-only class). Text/heredoc is the clean pay-per-token channel. See vLLM #22578/#44216, SGLang #8976/#10738, openai/harmony #68.

Zai

catalog provider: zai
recommended route: zai:glm-4.6 (glm-4.6)
endpoint style: OpenAI-compatible chat completions
recommended Harn options:

provider = "zai"
model = "glm-4.6"
tool_format = "text"
structured_output_mode = "native_json"

Caveats:

Family-consistency pin: GLM native channels emit <tool_call> markup as content (2026-06-23 Baseten GLM-5.2 probe); no zai-direct GLM-4.7 probe yet, so inherit the family verdict rather than an optimistic native pin.