AI Support Guide

Model Selection Guide

Which LLM to use for each AI support capability — recommendations by use case, budget, and requirements.

Choosing the right model for each service is the single most impactful decision for both quality and cost. This guide provides concrete recommendations based on each service's workload characteristics.

Decision factors

Before selecting a model, consider:

  1. Privacy requirements -- Can customer data leave your infrastructure? If not, use local models.
  2. Budget -- What is your monthly LLM spend target?
  3. Latency -- Is the service real-time (chat reply) or async (batch QA scoring)?
  4. Quality threshold -- Is "good enough" acceptable (triage) or does quality directly impact customers (reply drafting)?

Per-service recommendations

Simpli Triage

Task: Classify tickets into categories and assign urgency levels.

This is a structured classification task with short input and output. Speed and cost matter more than nuanced reasoning.

TierCloud modelLocal modelWhy
BudgetGPT-5-nano ($0.05/$0.40) or DeepSeek V3.2 ($0.28/$0.42)Gemma 3 27BClassification doesn't need large models
BalancedGPT-5-mini ($0.125/$1)Phi-4Reliable, low cost
QualityGPT-4.1-mini ($0.40/$1.60)Gemma 3 27BBetter edge-case handling
# Recommended
LITELLM_MODEL=openai/gpt-5-nano

# Local alternative
LITELLM_MODEL=ollama/gemma3:27b

Simpli Reply

Task: Generate draft responses for customer support tickets.

Quality is critical -- these drafts are shown to agents and may be sent to customers. Tone, empathy, and accuracy all matter. Longer outputs mean output token cost dominates.

TierCloud modelLocal modelWhy
BudgetDeepSeek V3.2 ($0.28/$0.42)Llama 4 ScoutDecent quality at minimal cost
BalancedClaude Sonnet 4.6 ($3/$15)Llama 4 ScoutBest reasoning-to-cost ratio
QualityClaude Opus 4.6 ($5/$25)Llama 4 ScoutMaximum empathy and nuance
# Recommended
LITELLM_MODEL=anthropic/claude-sonnet-4-6-20260401

# Local alternative
LITELLM_MODEL=ollama/llama4

Simpli QA

Task: Score support conversations against quality rubrics.

Requires strong reasoning to evaluate multi-turn conversations. Must understand nuance -- did the agent show empathy? Was the resolution complete? Longer inputs (full conversations) but structured output.

TierCloud modelLocal modelWhy
BudgetGPT-5-mini ($0.125/$1)Qwen 3 32BAcceptable for basic rubrics
BalancedClaude Sonnet 4.6 ($3/$15)Qwen 3 32BStrong reasoning, fair scoring
QualityClaude Opus 4.6 ($5/$25)Qwen 3 32BDeep reasoning with chain-of-thought
# Recommended
LITELLM_MODEL=anthropic/claude-sonnet-4-6-20260401

# Local alternative
LITELLM_MODEL=ollama/qwen3:32b

Simpli Sentiment

Task: Analyze sentiment and detect escalation risk.

Simple classification task -- even simpler than triage. Optimize for speed and cost.

TierCloud modelLocal modelWhy
BudgetGPT-5-nano ($0.05/$0.40) or DeepSeek V3.2 ($0.28/$0.42)Gemma 3 27BCheapest options available
BalancedGemini 2.5 Flash ($0.30/$2.50)Gemma 3 27BFast, reliable
QualityGPT-5-mini ($0.125/$1)Phi-4Better nuance detection
# Recommended
LITELLM_MODEL=openai/gpt-5-nano

# Local alternative
LITELLM_MODEL=ollama/gemma3:27b

Simpli KB

Task: Knowledge base article analysis, gap detection, semantic search.

Long context is the key requirement -- articles can be lengthy and gap analysis needs to consider the entire knowledge base. Quality of analysis matters for actionable recommendations.

TierCloud modelLocal modelWhy
BudgetGemini 2.5 Flash ($0.30/$2.50)Llama 4 Scout1M+ context at minimal cost
BalancedGemini 2.5 Pro ($1.25/$10)Llama 4 ScoutBest long-context quality
QualityClaude Sonnet 4.6 ($3/$15)Llama 4 Scout (10M ctx)Deep analysis, long context
# Recommended
LITELLM_MODEL=google/gemini-2.5-pro

# Local alternative
LITELLM_MODEL=ollama/llama4

Quick reference

ServiceBest cloudBest via OpenRouterBest localKey requirement
TriageGPT-5-nanoopenrouter/openai/gpt-5-nanoGemma 3 27BSpeed, low cost
ReplyClaude Sonnet 4.6openrouter/anthropic/claude-sonnet-4.6Llama 4 ScoutQuality, empathy
QAClaude Sonnet 4.6openrouter/anthropic/claude-sonnet-4.6Qwen 3 32BReasoning
SentimentGPT-5-nanoopenrouter/google/gemma-3-27bGemma 3 27BSpeed, low cost
KBGemini 2.5 Proopenrouter/google/gemini-2.5-proLlama 4 ScoutLong context

Multi-model setup

You can use different models for different services. Each service reads its own LITELLM_MODEL environment variable:

# docker-compose.yml environment per service — direct APIs
simpli-triage:
  LITELLM_MODEL: openai/gpt-5-nano

simpli-reply:
  LITELLM_MODEL: anthropic/claude-sonnet-4-6-20260401

simpli-qa:
  LITELLM_MODEL: anthropic/claude-sonnet-4-6-20260401

simpli-sentiment:
  LITELLM_MODEL: openai/gpt-5-nano

simpli-kb:
  LITELLM_MODEL: google/gemini-2.5-pro

OpenRouter single-key alternative

If you prefer managing a single API key, route everything through OpenRouter:

# docker-compose.yml — all services via OpenRouter
simpli-triage:
  LITELLM_MODEL: openrouter/openai/gpt-5-nano
  OPENROUTER_API_KEY: sk-or-...

simpli-reply:
  LITELLM_MODEL: openrouter/anthropic/claude-sonnet-4.6
  OPENROUTER_API_KEY: sk-or-...

simpli-qa:
  LITELLM_MODEL: openrouter/anthropic/claude-sonnet-4.6
  OPENROUTER_API_KEY: sk-or-...

simpli-sentiment:
  LITELLM_MODEL: openrouter/google/gemma-3-27b
  OPENROUTER_API_KEY: sk-or-...

simpli-kb:
  LITELLM_MODEL: openrouter/google/gemini-2.5-pro
  OPENROUTER_API_KEY: sk-or-...

Note that the sentiment service uses gemma-3-27b via OpenRouter -- this is a hosted open model at $0.10/$0.20 per M tokens, significantly cheaper than GPT-5-nano for simple classification. See the Cloud Providers page for more on OpenRouter's model catalog and pricing.

DeepSeek V3.2 ($0.28/$0.42) is also worth considering as a cost-effective middle ground between local models (free but require GPU hardware) and premium cloud APIs. It handles classification and simple generation well enough for budget-tier workloads while costing a fraction of models like Claude Sonnet 4.6 or GPT-4.1.

This tiered approach uses budget models for simple classification (triage, sentiment) and premium models for tasks where quality directly impacts output (reply, QA). See the Cost Optimization page for estimated savings.

On this page