AI Support Guide

LLM Providers

Overview of LLM providers for AI-powered support — cloud APIs, local inference, and open-weight models.

AI-powered support capabilities work best with a unified LLM abstraction layer like LiteLLM. This lets you swap providers by changing a single configuration value.

# Use OpenAI directly
LITELLM_MODEL=openai/gpt-5-mini

# Use Anthropic directly
LITELLM_MODEL=anthropic/claude-sonnet-4.6

# Use any model via OpenRouter
LITELLM_MODEL=openrouter/meta-llama/llama-4-scout

# Use a local model via Ollama
LITELLM_MODEL=ollama/gemma3:27b

Provider comparison

ProviderTop modelsStrengthsPricingBest for
OpenAIGPT-5, GPT-5-mini, GPT-4.1Broad ecosystem, fast iteration, function calling$0.05--$10/M tokensGeneral-purpose, high-volume
AnthropicClaude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.51M context, safety, extended thinking$1--$25/M tokensComplex reasoning, QA scoring
GoogleGemini 2.5 Pro, Gemini 2.5 Flash, Gemini 3 Flash Preview1M context, multimodal, grounding$0.30--$10/M tokensLong documents, KB analysis
xAIGrok 4, Grok 4.1 Fast2M context on Fast tier, strong reasoning$0.20--$15/M tokensLong-context tasks, cost-effective inference
DeepSeekDeepSeek V3.2, DeepSeek R1Extremely low cost, strong open models$0.28--$2.50/M tokensBudget-first production workloads
MistralLarge 3, Medium 3, Small 3.1, CodestralEU data residency, multilingual$0.20--$6/M tokensEU compliance, cost-sensitive
CohereCommand ARAG-optimized, citation groundingContact for pricingKnowledge base search
OpenRouterAll of the above + 300+ modelsSingle API key, model fallback, no vendor lock-inProvider pricing + small markupMulti-provider access, experimentation
Local (Ollama)Llama 4 Scout, Gemma 3 27B, Qwen 3, Phi-4Privacy, zero cost, air-gappedFree (hardware only)Development, privacy-first

Cost tracking

A cost tracking module can monitor LLM spend across capabilities:

from cost_tracker import CostTracker, TokenUsage

tracker = CostTracker()

# After a LiteLLM call
cost = tracker.record_from_response("openai/gpt-5-mini", response)
print(f"Call cost: ${cost.total_cost}")
print(f"Session total: ${tracker.total_cost}")

See the Cost Optimization guide for strategies to reduce spend.

Choosing a provider

The right choice depends on your requirements:

  • Quality-first: Anthropic Claude Opus 4.6 or OpenAI GPT-5
  • Budget-first: DeepSeek V3.2, OpenAI GPT-5-nano, or Google Gemini 2.5 Flash
  • Flexibility-first: OpenRouter for single-key access to 300+ models
  • Privacy-first: Local models via Ollama (Llama 4 Scout, Gemma 3 27B, Qwen 3)
  • EU compliance: Mistral with EU-hosted endpoints
  • Long context: xAI Grok 4.1 Fast (2M tokens), OpenAI GPT-4.1 (1M tokens), or Anthropic Claude Sonnet 4.6 (1M tokens)
  • Reasoning tasks: Anthropic Claude Opus 4.6, DeepSeek R1, or OpenAI GPT-5

See the Model Selection Guide for per-service recommendations.

On this page