Quality Improvement Loop
Build a continuous improvement cycle with QA, Sentiment, and Pulse.
Support quality is not a one-time audit. It is a cycle: score conversations, identify patterns, coach agents, and measure whether coaching worked. Three Simpli services power this loop — QA scores every conversation, Sentiment flags issues in real time, and Pulse tracks whether your team is actually improving.
This page walks through the full cycle and shows how to wire the services together into a continuous improvement process. It is written for team leads, QA analysts, and support operations teams.
The continuous improvement cycle
QA scores conversations
|
v
Scorecards surface coaching opportunities
|
v
Team leads coach agents
|
v
Agents improve
|
v
Pulse tracks improvement over time
|
v
Sentiment validates customer experience
|
v
Repeat — adjust rubrics, refine coaching, raise the barEach step feeds the next. QA tells you where agents struggle. Coaching addresses those gaps. Pulse proves whether the coaching worked. Sentiment confirms that customers feel the difference. Without closing the loop, you are guessing.
Step 1: Design your rubrics
Before QA can score anything, you need rubrics that define what "good" looks like for your team. A rubric is a set of weighted criteria that the QA service evaluates against each conversation.
Standard Support Rubric
Use this as a starting point for general support teams.
{
"name": "Standard Support",
"criteria": [
{
"name": "empathy",
"description": "Acknowledges customer emotions and shows understanding",
"weight": 0.25
},
{
"name": "resolution",
"description": "Resolves the issue completely or provides clear next steps",
"weight": 0.35
},
{
"name": "communication",
"description": "Clear, professional, and easy to understand",
"weight": 0.25
},
{
"name": "efficiency",
"description": "Resolves without unnecessary back-and-forth",
"weight": 0.15
}
]
}Resolution carries the highest weight because it is what customers care about most. Empathy and communication matter, but a perfectly empathetic response that does not solve the problem is still a bad outcome.
Technical Support Rubric
For technical teams, swap in an accuracy criterion and adjust the weights.
{
"name": "Technical Support",
"criteria": [
{
"name": "accuracy",
"description": "Technical information provided is correct",
"weight": 0.30
},
{
"name": "empathy",
"description": "Acknowledges customer frustration with technical issues",
"weight": 0.15
},
{
"name": "resolution",
"description": "Resolves the technical issue or escalates appropriately",
"weight": 0.30
},
{
"name": "communication",
"description": "Explains technical concepts clearly to non-technical customers",
"weight": 0.25
}
]
}Accuracy gets equal weight to resolution here because an incorrect technical answer can be worse than no answer at all — it erodes trust and creates follow-up tickets.
Tips for rubric design
- Involve your agents. They handle conversations daily and know what good looks like. Rubrics designed without agent input tend to measure the wrong things.
- Start with 3-5 criteria. More than that and scores become noisy. You can always add criteria later once the team is comfortable with the process.
- Weight resolution highest. Customers contact support to get something fixed. Everything else is important but secondary.
- Review and adjust after 2-4 weeks of data. Your first rubric will not be perfect. Look at conversations that scored surprisingly high or low and ask whether the rubric or the score is wrong.
Step 2: Automate QA scoring
Manual QA reviews are valuable but slow. Most teams can only review 5-10% of conversations. Automated QA lets you score every single conversation, which eliminates sampling bias and gives you statistically meaningful data.
Option A: Webhook on ticket resolution
The most common pattern. When a ticket is resolved in your helpdesk, fire a webhook that sends the conversation to QA.
# In your middleware — triggered when a ticket is solved
async def on_ticket_resolved(ticket_id, agent_id, messages):
score = httpx.post(f"{QA_URL}/evaluate", json={
"conversation_id": ticket_id,
"agent_id": agent_id,
"messages": messages,
"rubric_id": "R-standard",
}).json()
# Store score and check if coaching is needed
if score["overall_score"] < 0.7:
alert_team_lead(agent_id, score)This gives you scores within seconds of resolution. The threshold alert (0.7 in this example) ensures team leads hear about problems quickly.
Option B: Scheduled batch processing
If webhook integration is not feasible, run a nightly batch job that evaluates the previous day's conversations.
# Cron job that evaluates yesterday's conversations
for ticket in get_resolved_tickets(yesterday):
score = evaluate(ticket)Batch processing is simpler to set up but delays feedback by up to 24 hours. Use it as a starting point, then move to webhooks when you are ready.
Evaluate everything
The key advantage of automated QA is that you can evaluate all conversations, not a sample. This matters because:
- Small samples miss patterns. An agent who struggles with billing tickets but excels at technical ones will look average in a random sample.
- 100% coverage makes trends statistically reliable. When a score moves, you know it is real.
- Agents take it more seriously when they know every conversation counts, not just the ones that happen to get reviewed.
Step 3: Read scorecards and identify coaching opportunities
Once QA is scoring conversations, the scorecard endpoint aggregates results per agent. This is the primary tool for team leads.
GET /scorecards/{agent_id}
{
"agent_id": "A-005",
"average_score": 0.78,
"total_reviews": 47,
"trend": "declining",
"top_strengths": ["empathy", "communication"],
"improvement_areas": ["resolution", "efficiency"]
}How to interpret scorecards
- Declining trend + specific weakness. This agent needs targeted coaching. In the example above, the agent communicates well but struggles to actually resolve issues. That might indicate a knowledge gap — they know how to talk to customers but do not have the technical depth to fix the problem.
- High score + stable trend. Recognize this agent publicly and ask them to share best practices. Peer coaching is more effective than top-down feedback.
- Low score across all dimensions. A broader issue. The agent may need a training refresher, or they may be overwhelmed by ticket volume and cutting corners.
- Inconsistent scores. High variance usually means the agent handles some ticket types well and others poorly. Break down scores by ticket category to find the gap.
Building a coaching cadence
Scorecards are most useful when reviewed on a regular schedule:
- Daily: Glance at the alert feed for any conversations that scored below your threshold.
- Weekly: Review each agent's scorecard trend. Identify one coaching focus per agent.
- Monthly: Look at team-wide trends. Are specific criteria improving or declining across the board? Adjust rubrics if needed.
Step 4: Sentiment as an early warning
QA scores are backward-looking — they evaluate a conversation after it ends. Sentiment analysis runs in real time, giving you an early warning system for conversations that are going sideways.
Real-time alerts
GET /alerts?severity=high surfaces at-risk conversations immediately. A customer whose sentiment is dropping mid-conversation needs attention now, not in a post-mortem.
Team leads can intervene during the conversation:
- Offer the agent guidance via internal chat
- Reassign the ticket to a more experienced agent
- Proactively escalate before the customer has to ask
Customer relationship tracking
GET /customers/{id}/sentiment shows sentiment trends for a specific customer over time. A customer whose sentiment has been declining across multiple interactions is a churn risk, regardless of how any single conversation scored.
Combining QA and Sentiment
QA and Sentiment answer different questions:
- QA measures conversation quality from an operational perspective. Did the agent follow best practices? Was the issue resolved?
- Sentiment measures the customer's perception. Did the customer feel heard? Are they satisfied?
These do not always align. An agent can follow every rubric criterion perfectly and still leave the customer frustrated — for example, if the resolution requires the customer to do extra work. Conversely, an agent who bends the rules to make a customer happy might score lower on process compliance but higher on sentiment.
Track both. When QA scores are high but sentiment is low, your rubric may be missing something. When sentiment is high but QA scores are low, your rubric may be too strict.
Step 5: Track improvement with Pulse
Pulse provides the operational metrics that prove the loop is working. Without Pulse, you are coaching and hoping. With Pulse, you can measure.
- GET
/metricsreturns real-time quality and operational metrics — average QA score, response times, resolution rates, and ticket volumes. - GET
/forecastpredicts future ticket volume so you can staff appropriately. Understaffed teams produce lower quality work regardless of coaching.
Connecting Pulse to the loop
Quality trends over time are the clearest signal of whether your coaching is effective. If you coached an agent on resolution skills two weeks ago, their resolution criterion score in Pulse should be trending upward. If it is flat, the coaching approach needs adjustment.
Pulse also reveals systemic issues. If average QA scores drop team-wide during peak hours, the problem is not individual performance — it is staffing. If scores drop on a specific ticket category, the problem is knowledge, not skill.
Measuring the loop's effectiveness
Track these metrics to know whether your improvement cycle is working.
| Metric | Source | What it tells you |
|---|---|---|
| Average QA score | QA /scorecards | Overall quality level |
| Score trend | QA /scorecards | Whether coaching is working |
| Escalation rate | Sentiment /alerts | Whether frustration is decreasing |
| First response time | Pulse /metrics | Operational efficiency |
| CSAT | Your helpdesk | Customer satisfaction (the ultimate metric) |
CSAT is the ground truth. If QA scores are rising but CSAT is flat, your rubric is measuring the wrong things. If CSAT rises alongside QA scores, you have validated the loop.
Common pitfalls
Using QA for punishment. The moment agents feel that QA scores are used against them, they will game the system — writing longer responses to appear thorough, avoiding difficult tickets to protect their average, or disputing every low score. Use QA for coaching and development. Celebrate improvement, do not penalize imperfection.
Too many rubric criteria. 3-5 criteria is the sweet spot. More than 7 and individual criterion scores become unreliable. Agents also cannot focus on improving 8 things at once. Pick what matters most.
Not calibrating. Automated QA is not infallible. Compare automated scores with manual human reviews at least monthly. Pull 20-30 conversations, score them manually, and compare. If the automated scores diverge from human judgment, adjust your rubrics and prompts.
Ignoring agent feedback. If agents consistently disagree with their scores, investigate before dismissing their concerns. They might be right — the rubric may not account for edge cases, or the model may be misjudging certain conversation styles. Agent buy-in is essential for the loop to work.
Set and forget. What "good" looks like evolves as your product changes, your customer base shifts, and your team grows. Review rubrics quarterly. Retire criteria that no longer differentiate good from great. Add new ones as customer expectations change.
Next steps
- QA Service — API reference for all QA endpoints
- Sentiment Service — API reference for Sentiment endpoints
- Measuring Success — ROI tracking and reporting
- Executive Reporting — Present quality data to leadership