LLM Providers

Overview of LLM providers for AI-powered support — cloud APIs, local inference, and open-weight models.

AI-powered support capabilities work best with a unified LLM abstraction layer like LiteLLM. This lets you swap providers by changing a single configuration value.

# Use OpenAI directly
LITELLM_MODEL=openai/gpt-5-mini

# Use Anthropic directly
LITELLM_MODEL=anthropic/claude-sonnet-4.6

# Use any model via OpenRouter
LITELLM_MODEL=openrouter/meta-llama/llama-4-scout

# Use a local model via Ollama
LITELLM_MODEL=ollama/gemma3:27b

Provider comparison

Provider	Top models	Strengths	Pricing	Best for
OpenAI	GPT-5, GPT-5-mini, GPT-4.1	Broad ecosystem, fast iteration, function calling	$0.05--$10/M tokens	General-purpose, high-volume
Anthropic	Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5	1M context, safety, extended thinking	$1--$25/M tokens	Complex reasoning, QA scoring
Google	Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Flash Preview	1M context, multimodal, grounding	$0.30--$10/M tokens	Long documents, KB analysis
xAI	Grok 4, Grok 4.1 Fast	2M context on Fast tier, strong reasoning	$0.20--$15/M tokens	Long-context tasks, cost-effective inference
DeepSeek	DeepSeek V3.2, DeepSeek R1	Extremely low cost, strong open models	$0.28--$2.50/M tokens	Budget-first production workloads
Mistral	Large 3, Medium 3, Small 3.1, Codestral	EU data residency, multilingual	$0.20--$6/M tokens	EU compliance, cost-sensitive
Cohere	Command A	RAG-optimized, citation grounding	Contact for pricing	Knowledge base search
OpenRouter	All of the above + 300+ models	Single API key, model fallback, no vendor lock-in	Provider pricing + small markup	Multi-provider access, experimentation
Local (Ollama)	Llama 4 Scout, Gemma 3 27B, Qwen 3, Phi-4	Privacy, zero cost, air-gapped	Free (hardware only)	Development, privacy-first

Cost tracking

A cost tracking module can monitor LLM spend across capabilities:

from cost_tracker import CostTracker, TokenUsage

tracker = CostTracker()

# After a LiteLLM call
cost = tracker.record_from_response("openai/gpt-5-mini", response)
print(f"Call cost: ${cost.total_cost}")
print(f"Session total: ${tracker.total_cost}")

See the Cost Optimization guide for strategies to reduce spend.

Choosing a provider

The right choice depends on your requirements:

Quality-first: Anthropic Claude Opus 4.6 or OpenAI GPT-5
Budget-first: DeepSeek V3.2, OpenAI GPT-5-nano, or Google Gemini 2.5 Flash
Flexibility-first: OpenRouter for single-key access to 300+ models
Privacy-first: Local models via Ollama (Llama 4 Scout, Gemma 3 27B, Qwen 3)
EU compliance: Mistral with EU-hosted endpoints
Long context: xAI Grok 4.1 Fast (2M tokens), OpenAI GPT-4.1 (1M tokens), or Anthropic Claude Sonnet 4.6 (1M tokens)
Reasoning tasks: Anthropic Claude Opus 4.6, DeepSeek R1, or OpenAI GPT-5

See the Model Selection Guide for per-service recommendations.

Provider comparison

Cost tracking

Choosing a provider

On this page