OpenFang supports 123+ models across 27 providers with intelligent routing, automatic fallback, and per-agent model overrides.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/RightNow-AI/openfang/llms.txt
Use this file to discover all available pages before exploring further.
Default Model
Every OpenFang instance requires a default model:Model identifier from the provider’s catalog. Use
openfang models list to see all available models.Name of the environment variable containing the API key (NOT the key itself).
Override the default API endpoint. Useful for proxies or self-hosted instances.
Fallback Provider Chain
Configure automatic failover to backup providers when the primary fails:Fallback chains are tried sequentially. The first successful provider is used. This provides resilience against rate limits, outages, and API errors.
Model Tiers
OpenFang categorizes models into tiers based on capability and cost:| Tier | Description | Use Cases | Examples |
|---|---|---|---|
| Frontier | Most capable, highest cost | Complex reasoning, research, code generation | Claude Opus 4, GPT-4o, Gemini 2.0 Flash Thinking |
| Smart | Balanced capability/cost | General agent tasks, analysis | Claude Sonnet 4, GPT-4o-mini, Gemini 2.0 Flash |
| Balanced | Good performance, moderate cost | Standard workflows, data processing | Llama 3.3 70B, Qwen Plus |
| Fast | High speed, low cost | Simple tasks, high volume | Claude Haiku 4.5, Groq Llama 3.3, GLM-4 Flash |
| Local | Self-hosted, zero cost | Privacy-critical, offline | Ollama models, LM Studio |
| Custom | User-defined models | Custom endpoints, experiments | - |
Per-Agent Model Override
Agents can use different models than the system default:Custom Provider URLs
Override base URLs for proxies, custom endpoints, or self-hosted models:Model Aliases
Use short aliases instead of full model identifiers:Model Capabilities
Different models support different features:Tool Calling (Function Calling)
Most modern models support tool calling:- ✅ Claude 3+, GPT-4+, Gemini 1.5+, Llama 3.1+
- ❌ Older models, some vision-only models
Vision (Image Understanding)
Models that can process images:- Claude Opus/Sonnet 4, GPT-4o/4-turbo, Gemini 2.0 Flash, Qwen VL
Streaming
All major providers support streaming responses except:- Some Replicate models
- Certain Bedrock configurations
Cost Tracking
OpenFang automatically tracks token usage and estimated costs:Session Compaction
Automatically compress conversation history when it grows too large:Compaction uses an LLM to summarize older messages, preserving context while reducing token usage. The most recent messages are always kept intact.
Embedding Models
Configure models for vector embeddings (memory search):- OpenAI:
text-embedding-3-small,text-embedding-3-large - Cohere:
embed-english-v3.0,embed-multilingual-v3.0 - Voyage:
voyage-2,voyage-code-2 - Local:
ollama/nomic-embed-text
Model Discovery
OpenFang can auto-discover models from local providers:http://localhost:11434/api/tags.
Model Routing Examples
Troubleshooting
Model Not Found
API Key Issues
Rate Limits
Configure fallback providers to handle rate limits automatically:Next Steps
Provider Setup
Configure all 27 LLM providers
Channel Configuration
Connect messaging platforms