Model Configuration

OpenFang supports 123+ models across 27 providers with intelligent routing, automatic fallback, and per-agent model overrides.

Default Model

Every OpenFang instance requires a default model:

[default_model]
provider = "anthropic"                    # Provider identifier
model = "claude-sonnet-4-20250514"        # Model identifier
api_key_env = "ANTHROPIC_API_KEY"         # Environment variable for API key
# base_url = "https://api.anthropic.com"  # Optional: custom endpoint

provider

string

required

Provider identifier. See Providers for all options.

model

string

required

Model identifier from the provider’s catalog. Use openfang models list to see all available models.

api_key_env

string

required

Name of the environment variable containing the API key (NOT the key itself).

base_url

string

Override the default API endpoint. Useful for proxies or self-hosted instances.

Fallback Provider Chain

Configure automatic failover to backup providers when the primary fails:

[default_model]
provider = "anthropic"
model = "claude-sonnet-4-20250514"
api_key_env = "ANTHROPIC_API_KEY"

# Tried in order if primary fails
[[fallback_providers]]
provider = "openai"
model = "gpt-4o"
api_key_env = "OPENAI_API_KEY"

[[fallback_providers]]
provider = "groq"
model = "llama-3.3-70b-versatile"
api_key_env = "GROQ_API_KEY"

[[fallback_providers]]
provider = "ollama"
model = "llama3.2:latest"
# No API key needed for local Ollama

Fallback chains are tried sequentially. The first successful provider is used. This provides resilience against rate limits, outages, and API errors.

Model Tiers

OpenFang categorizes models into tiers based on capability and cost:

Tier	Description	Use Cases	Examples
Frontier	Most capable, highest cost	Complex reasoning, research, code generation	Claude Opus 4, GPT-4o, Gemini 2.0 Flash Thinking
Smart	Balanced capability/cost	General agent tasks, analysis	Claude Sonnet 4, GPT-4o-mini, Gemini 2.0 Flash
Balanced	Good performance, moderate cost	Standard workflows, data processing	Llama 3.3 70B, Qwen Plus
Fast	High speed, low cost	Simple tasks, high volume	Claude Haiku 4.5, Groq Llama 3.3, GLM-4 Flash
Local	Self-hosted, zero cost	Privacy-critical, offline	Ollama models, LM Studio
Custom	User-defined models	Custom endpoints, experiments	-

Per-Agent Model Override

Agents can use different models than the system default:

name = "coder"
description = "Expert coding assistant"

[model]
provider = "openai"
model = "gpt-4o"

[capabilities]
tools = ["shell", "file_read", "file_write", "web_fetch"]

Custom Provider URLs

Override base URLs for proxies, custom endpoints, or self-hosted models:

# OpenAI-compatible proxy
[provider_urls]
openai = "https://my-proxy.internal/v1"

# Self-hosted Ollama
[provider_urls]
ollama = "http://gpu-server.local:11434"

# Custom vLLM deployment
[provider_urls]
vllm = "http://10.0.0.50:8000/v1"

# Azure OpenAI
[provider_urls]
openai = "https://my-resource.openai.azure.com/openai/deployments/gpt-4o"

When using custom URLs, ensure the endpoint is OpenAI API-compatible. OpenFang uses three drivers: Anthropic, Gemini, and OpenAI-compatible.

Model Aliases

Use short aliases instead of full model identifiers:

# Instead of "claude-sonnet-4-20250514"
openfang chat --model sonnet

# Instead of "gpt-4o-2024-08-06" 
openfang chat --model gpt4

# Instead of "llama-3.3-70b-versatile"
openfang chat --model llama

View all available aliases:

openfang models aliases

Model Capabilities

Different models support different features:

Tool Calling (Function Calling)

Most modern models support tool calling:

✅ Claude 3+, GPT-4+, Gemini 1.5+, Llama 3.1+
❌ Older models, some vision-only models

Vision (Image Understanding)

Models that can process images:

Claude Opus/Sonnet 4, GPT-4o/4-turbo, Gemini 2.0 Flash, Qwen VL

Streaming

All major providers support streaming responses except:

Some Replicate models
Certain Bedrock configurations

Cost Tracking

OpenFang automatically tracks token usage and estimated costs:

# Display usage info in response footers
usage_footer = "Full"  # "Off", "Tokens", "Cost", or "Full"

View cost analytics:

# Total budget across all agents
openfang budget

# Per-agent spending
openfang budget agents

# Specific agent
openfang budget agent coder

Session Compaction

Automatically compress conversation history when it grows too large:

[compaction]
threshold = 80                          # Compact when messages exceed this count
keep_recent = 20                        # Keep this many recent messages
max_summary_tokens = 1024               # Max tokens for LLM-generated summary

Compaction uses an LLM to summarize older messages, preserving context while reducing token usage. The most recent messages are always kept intact.

Embedding Models

Configure models for vector embeddings (memory search):

[memory]
provider = "openai"
model = "text-embedding-3-small"
api_key_env = "OPENAI_API_KEY"

Supported embedding providers:

OpenAI: text-embedding-3-small, text-embedding-3-large
Cohere: embed-english-v3.0, embed-multilingual-v3.0
Voyage: voyage-2, voyage-code-2
Local: ollama/nomic-embed-text

Model Discovery

OpenFang can auto-discover models from local providers:

# List all available models (includes discovered)
openfang models list

# Filter by provider
openfang models list --provider ollama

# Show only configured providers
openfang models list --available

For Ollama, models are dynamically discovered from http://localhost:11434/api/tags.

Model Routing Examples

# Use cheapest model that works
[default_model]
provider = "groq"
model = "llama-3.3-70b-versatile"  # Free tier available
api_key_env = "GROQ_API_KEY"

[[fallback_providers]]
provider = "ollama"
model = "llama3.2:latest"  # Fully local, zero cost

Troubleshooting

Model Not Found

# Check if model exists in catalog
openfang models list | grep sonnet

# Check provider configuration
openfang providers status

API Key Issues

# Verify environment variable is set
echo $ANTHROPIC_API_KEY

# Test provider connectivity
openfang providers test anthropic

Rate Limits

Configure fallback providers to handle rate limits automatically:

[[fallback_providers]]
provider = "groq"  # Free tier: 30 req/min
model = "llama-3.3-70b-versatile"
api_key_env = "GROQ_API_KEY"

Get Started

Core Concepts

Autonomous Hands

Configuration

Integrations

Guides

Model Configuration

Default Model

Fallback Provider Chain

Model Tiers

Per-Agent Model Override

Custom Provider URLs

Model Aliases

Model Capabilities

Tool Calling (Function Calling)

Vision (Image Understanding)

Streaming

Cost Tracking

Session Compaction

Embedding Models

Model Discovery

Model Routing Examples

Troubleshooting

Model Not Found

API Key Issues

Rate Limits

Next Steps

Provider Setup

Channel Configuration

Get Started

Core Concepts

Autonomous Hands

Configuration

Integrations

Guides

Documentation Index

​Default Model

​Fallback Provider Chain

​Model Tiers

​Per-Agent Model Override

​Custom Provider URLs

​Model Aliases

​Model Capabilities

​Tool Calling (Function Calling)

​Vision (Image Understanding)

​Streaming

​Cost Tracking

​Session Compaction

​Embedding Models

​Model Discovery

​Model Routing Examples

​Troubleshooting

​Model Not Found

​API Key Issues

​Rate Limits

​Next Steps

Provider Setup

Channel Configuration

Default Model

Fallback Provider Chain

Model Tiers

Per-Agent Model Override

Custom Provider URLs

Model Aliases

Model Capabilities

Tool Calling (Function Calling)

Vision (Image Understanding)

Streaming

Cost Tracking

Session Compaction

Embedding Models

Model Discovery

Model Routing Examples

Troubleshooting

Model Not Found

API Key Issues

Rate Limits

Next Steps