llm.yaml

LLM provider registry. Each agent's model.provider must resolve to a key in this file.

Source: crates/config/src/types/llm.rs.

Shape

providers:
  minimax:
    api_key: ${MINIMAX_API_KEY:-}
    group_id: ${MINIMAX_GROUP_ID:-}
    base_url: https://api.minimax.io
    rate_limit:
      requests_per_second: 2.0
      quota_alert_threshold: 100000
  anthropic:
    api_key: ${ANTHROPIC_API_KEY:-}
    base_url: https://api.anthropic.com
    rate_limit:
      requests_per_second: 2.0
    auth:
      mode: oauth_bundle
      bundle: ./secrets/anthropic_oauth.json
retry:
  max_attempts: 5
  initial_backoff_ms: 1000
  max_backoff_ms: 60000
  backoff_multiplier: 2.0

Per-provider fields

FieldTypeRequiredDefaultPurpose
api_keystringAPI key. Supports ${ENV_VAR} and ${file:…}.
base_urlurlAPI endpoint. Override to use a proxy or a local server.
group_idstringMiniMax-only. Group identifier.
rate_limit.requests_per_secondf642.0Outbound throttle.
rate_limit.quota_alert_thresholdu64Optional soft-alarm tokens-per-day threshold.
api_flavorenumopenai_compatopenai_compat or anthropic_messages. Lets MiniMax expose the Anthropic wire.
embedding_modelstringOverride model used for embeddings (e.g. Gemini's text-embedding-004).
safety_settingsJSONGemini-only; attached verbatim to requests.

Top-level retry block

Applies to every provider that doesn't define its own:

FieldDefaultPurpose
max_attempts5Total attempts including the first try.
initial_backoff_ms1000First backoff.
max_backoff_ms60000Cap.
backoff_multiplier2.0Exponential factor.

Retries are jittered to avoid thundering-herd reconnects. See Fault tolerance — Retry policies.

Auth modes

auth:
  mode: auto | static | token_plan | oauth_bundle
  bundle: ./secrets/anthropic_oauth.json
  setup_token_file: ./secrets/anthropic_setup.json
  refresh_endpoint: https://auth.example.com/refresh
  client_id: your-oauth-client
modeWhen
autoLet the provider client decide from available credentials.
staticUse api_key verbatim.
token_planMiniMax "Token Plan" OAuth bundle.
oauth_bundleAnthropic PKCE OAuth bundle written by agent setup.

Supported providers

KeyNotes
minimaxPrimary provider. MiniMax M2.5. OpenAI-compat or Anthropic-flavour wire.
anthropicClaude models. API key or OAuth subscription.
openaiOpenAI API and anything speaking its wire (Ollama, Groq, local proxies).
geminiGoogle Gemini, including embedding support.

Provider-specific docs

Common mistakes

  • api_key: sk-… committed to git. Use ${ENV_VAR} or ${file:./secrets/…}; the secrets/ directory is gitignored.
  • Mismatched embedding_model dimensions. The vector store asserts embedding.dimensions matches the model output. A mismatch aborts startup with an explicit message.
  • Setting both api_key and auth.mode: oauth_bundle. The auth mode wins. The api_key is kept as a fallback for tools that bypass the OAuth path.

Input-token reduction (context_optimization)

Four independent kill switches for prompt caching, online history compaction, pre-flight token counting, and the workspace bundle cache. Full schema, defaults, and rollout guidance in Operations → Context optimization.