LLM Providers
Overview of LLM providers for AI-powered support — cloud APIs, local inference, and open-weight models.
AI-powered support capabilities work best with a unified LLM abstraction layer like LiteLLM. This lets you swap providers by changing a single configuration value.
# Use OpenAI directly
LITELLM_MODEL=openai/gpt-5-mini
# Use Anthropic directly
LITELLM_MODEL=anthropic/claude-sonnet-4.6
# Use any model via OpenRouter
LITELLM_MODEL=openrouter/meta-llama/llama-4-scout
# Use a local model via Ollama
LITELLM_MODEL=ollama/gemma3:27bProvider comparison
| Provider | Top models | Strengths | Pricing | Best for |
|---|---|---|---|---|
| OpenAI | GPT-5, GPT-5-mini, GPT-4.1 | Broad ecosystem, fast iteration, function calling | $0.05--$10/M tokens | General-purpose, high-volume |
| Anthropic | Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5 | 1M context, safety, extended thinking | $1--$25/M tokens | Complex reasoning, QA scoring |
| Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Flash Preview | 1M context, multimodal, grounding | $0.30--$10/M tokens | Long documents, KB analysis | |
| xAI | Grok 4, Grok 4.1 Fast | 2M context on Fast tier, strong reasoning | $0.20--$15/M tokens | Long-context tasks, cost-effective inference |
| DeepSeek | DeepSeek V3.2, DeepSeek R1 | Extremely low cost, strong open models | $0.28--$2.50/M tokens | Budget-first production workloads |
| Mistral | Large 3, Medium 3, Small 3.1, Codestral | EU data residency, multilingual | $0.20--$6/M tokens | EU compliance, cost-sensitive |
| Cohere | Command A | RAG-optimized, citation grounding | Contact for pricing | Knowledge base search |
| OpenRouter | All of the above + 300+ models | Single API key, model fallback, no vendor lock-in | Provider pricing + small markup | Multi-provider access, experimentation |
| Local (Ollama) | Llama 4 Scout, Gemma 3 27B, Qwen 3, Phi-4 | Privacy, zero cost, air-gapped | Free (hardware only) | Development, privacy-first |
Cost tracking
A cost tracking module can monitor LLM spend across capabilities:
from cost_tracker import CostTracker, TokenUsage
tracker = CostTracker()
# After a LiteLLM call
cost = tracker.record_from_response("openai/gpt-5-mini", response)
print(f"Call cost: ${cost.total_cost}")
print(f"Session total: ${tracker.total_cost}")See the Cost Optimization guide for strategies to reduce spend.
Choosing a provider
The right choice depends on your requirements:
- Quality-first: Anthropic Claude Opus 4.6 or OpenAI GPT-5
- Budget-first: DeepSeek V3.2, OpenAI GPT-5-nano, or Google Gemini 2.5 Flash
- Flexibility-first: OpenRouter for single-key access to 300+ models
- Privacy-first: Local models via Ollama (Llama 4 Scout, Gemma 3 27B, Qwen 3)
- EU compliance: Mistral with EU-hosted endpoints
- Long context: xAI Grok 4.1 Fast (2M tokens), OpenAI GPT-4.1 (1M tokens), or Anthropic Claude Sonnet 4.6 (1M tokens)
- Reasoning tasks: Anthropic Claude Opus 4.6, DeepSeek R1, or OpenAI GPT-5
See the Model Selection Guide for per-service recommendations.