Model Selection Guide

Which LLM to use for each AI support capability — recommendations by use case, budget, and requirements.

Choosing the right model for each service is the single most impactful decision for both quality and cost. This guide provides concrete recommendations based on each service's workload characteristics.

Decision factors

Before selecting a model, consider:

Privacy requirements -- Can customer data leave your infrastructure? If not, use local models.
Budget -- What is your monthly LLM spend target?
Latency -- Is the service real-time (chat reply) or async (batch QA scoring)?
Quality threshold -- Is "good enough" acceptable (triage) or does quality directly impact customers (reply drafting)?

Per-service recommendations

Simpli Triage

Task: Classify tickets into categories and assign urgency levels.

This is a structured classification task with short input and output. Speed and cost matter more than nuanced reasoning.

Tier	Cloud model	Local model	Why
Budget	GPT-5-nano ($0.05/$0.40) or DeepSeek V3.2 ($0.28/$0.42)	Gemma 3 27B	Classification doesn't need large models
Balanced	GPT-5-mini ($0.125/$1)	Phi-4	Reliable, low cost
Quality	GPT-4.1-mini ($0.40/$1.60)	Gemma 3 27B	Better edge-case handling

# Recommended
LITELLM_MODEL=openai/gpt-5-nano

# Local alternative
LITELLM_MODEL=ollama/gemma3:27b

Simpli Reply

Task: Generate draft responses for customer support tickets.

Quality is critical -- these drafts are shown to agents and may be sent to customers. Tone, empathy, and accuracy all matter. Longer outputs mean output token cost dominates.

Tier	Cloud model	Local model	Why
Budget	DeepSeek V3.2 ($0.28/$0.42)	Llama 4 Scout	Decent quality at minimal cost
Balanced	Claude Sonnet 4.6 ($3/$15)	Llama 4 Scout	Best reasoning-to-cost ratio
Quality	Claude Opus 4.6 ($5/$25)	Llama 4 Scout	Maximum empathy and nuance

# Recommended
LITELLM_MODEL=anthropic/claude-sonnet-4-6-20260401

# Local alternative
LITELLM_MODEL=ollama/llama4

Simpli QA

Task: Score support conversations against quality rubrics.

Requires strong reasoning to evaluate multi-turn conversations. Must understand nuance -- did the agent show empathy? Was the resolution complete? Longer inputs (full conversations) but structured output.

Tier	Cloud model	Local model	Why
Budget	GPT-5-mini ($0.125/$1)	Qwen 3 32B	Acceptable for basic rubrics
Balanced	Claude Sonnet 4.6 ($3/$15)	Qwen 3 32B	Strong reasoning, fair scoring
Quality	Claude Opus 4.6 ($5/$25)	Qwen 3 32B	Deep reasoning with chain-of-thought

# Recommended
LITELLM_MODEL=anthropic/claude-sonnet-4-6-20260401

# Local alternative
LITELLM_MODEL=ollama/qwen3:32b

Simpli Sentiment

Task: Analyze sentiment and detect escalation risk.

Simple classification task -- even simpler than triage. Optimize for speed and cost.

Tier	Cloud model	Local model	Why
Budget	GPT-5-nano ($0.05/$0.40) or DeepSeek V3.2 ($0.28/$0.42)	Gemma 3 27B	Cheapest options available
Balanced	Gemini 2.5 Flash ($0.30/$2.50)	Gemma 3 27B	Fast, reliable
Quality	GPT-5-mini ($0.125/$1)	Phi-4	Better nuance detection

# Recommended
LITELLM_MODEL=openai/gpt-5-nano

# Local alternative
LITELLM_MODEL=ollama/gemma3:27b

Simpli KB

Task: Knowledge base article analysis, gap detection, semantic search.

Long context is the key requirement -- articles can be lengthy and gap analysis needs to consider the entire knowledge base. Quality of analysis matters for actionable recommendations.

Tier	Cloud model	Local model	Why
Budget	Gemini 2.5 Flash ($0.30/$2.50)	Llama 4 Scout	1M+ context at minimal cost
Balanced	Gemini 2.5 Pro ($1.25/$10)	Llama 4 Scout	Best long-context quality
Quality	Claude Sonnet 4.6 ($3/$15)	Llama 4 Scout (10M ctx)	Deep analysis, long context

# Recommended
LITELLM_MODEL=google/gemini-2.5-pro

# Local alternative
LITELLM_MODEL=ollama/llama4

Quick reference

Service	Best cloud	Best via OpenRouter	Best local	Key requirement
Triage	GPT-5-nano	`openrouter/openai/gpt-5-nano`	Gemma 3 27B	Speed, low cost
Reply	Claude Sonnet 4.6	`openrouter/anthropic/claude-sonnet-4.6`	Llama 4 Scout	Quality, empathy
QA	Claude Sonnet 4.6	`openrouter/anthropic/claude-sonnet-4.6`	Qwen 3 32B	Reasoning
Sentiment	GPT-5-nano	`openrouter/google/gemma-3-27b`	Gemma 3 27B	Speed, low cost
KB	Gemini 2.5 Pro	`openrouter/google/gemini-2.5-pro`	Llama 4 Scout	Long context

Multi-model setup

You can use different models for different services. Each service reads its own LITELLM_MODEL environment variable:

# docker-compose.yml environment per service — direct APIs
simpli-triage:
  LITELLM_MODEL: openai/gpt-5-nano

simpli-reply:
  LITELLM_MODEL: anthropic/claude-sonnet-4-6-20260401

simpli-qa:
  LITELLM_MODEL: anthropic/claude-sonnet-4-6-20260401

simpli-sentiment:
  LITELLM_MODEL: openai/gpt-5-nano

simpli-kb:
  LITELLM_MODEL: google/gemini-2.5-pro

OpenRouter single-key alternative

If you prefer managing a single API key, route everything through OpenRouter:

# docker-compose.yml — all services via OpenRouter
simpli-triage:
  LITELLM_MODEL: openrouter/openai/gpt-5-nano
  OPENROUTER_API_KEY: sk-or-...

simpli-reply:
  LITELLM_MODEL: openrouter/anthropic/claude-sonnet-4.6
  OPENROUTER_API_KEY: sk-or-...

simpli-qa:
  LITELLM_MODEL: openrouter/anthropic/claude-sonnet-4.6
  OPENROUTER_API_KEY: sk-or-...

simpli-sentiment:
  LITELLM_MODEL: openrouter/google/gemma-3-27b
  OPENROUTER_API_KEY: sk-or-...

simpli-kb:
  LITELLM_MODEL: openrouter/google/gemini-2.5-pro
  OPENROUTER_API_KEY: sk-or-...

Note that the sentiment service uses gemma-3-27b via OpenRouter -- this is a hosted open model at $0.10/$0.20 per M tokens, significantly cheaper than GPT-5-nano for simple classification. See the Cloud Providers page for more on OpenRouter's model catalog and pricing.

DeepSeek V3.2 ($0.28/$0.42) is also worth considering as a cost-effective middle ground between local models (free but require GPU hardware) and premium cloud APIs. It handles classification and simple generation well enough for budget-tier workloads while costing a fraction of models like Claude Sonnet 4.6 or GPT-4.1.

This tiered approach uses budget models for simple classification (triage, sentiment) and premium models for tasks where quality directly impacts output (reply, QA). See the Cost Optimization page for estimated savings.

On this page