Multi-agent systems
Gartner reported a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025, and projects that 40% of enterprise applications will feature task-specific agents by the end of 2026 (up from less than 5% in 2025). The agentic AI market is valued at $7.63 billion in 2025 and projected to reach $182 billion by 2030.
The architectural shift mirrors the monolith-to-microservices evolution in software engineering. Instead of one all-purpose agent trying to handle every support scenario, teams are building orchestrated teams of specialised agents: one for classification, one for knowledge retrieval, one for response drafting, one for quality checking.
Deloitte's analysis identifies four implementation patterns: task orchestration (assigning work between humans and AI), agent governance (compliance and safety enforcement), cross-system coordination (aligning agents across CRM, ERP, and helpdesk), and performance optimisation (monitoring and measuring outcomes). The most mature organisations combine all four.
Voice agents come of age
Voice agents have matured from novelty to production-ready in 2026. Deployments now report 40-70% resolution rates without human escalation, with 20-30% operational cost reductions for teams that implement them effectively. The U.S. alone is projected to have 157.1 million voice assistant users in 2026.
The technical breakthroughs driving this are sub-400ms response latency (enabling natural conversational flow) and robust barge-in handling (letting customers interrupt the agent mid-sentence without breaking the interaction). These were unsolved problems as recently as 2024.
Leading platforms include Assembled (omnichannel voice, chat, email with integrated planning), Retell AI (known for clean escalation handling and stability even when calls fail), and Ringg AI (specialising in ultra-low latency responses). The shift is from voice-first to omnichannel: the same agent handles voice, chat, and email, maintaining context across channels.
Human-in-the-loop is not optional
As AI agents take on more autonomous actions — processing refunds, updating accounts, cancelling orders — human-in-the-loop (HITL) has moved from a nice-to-have to a requirement. The pattern is straightforward: the agent drafts a plan, the system pauses before execution, a human reviews and approves (or rejects), and only then does the agent act.
This matters most for high-stakes actions where errors are costly or irreversible. A support agent autonomously issuing a $5,000 refund based on a misunderstood conversation creates a very different risk profile than one drafting a response that a human reviews before sending.
The most effective HITL implementations use confidence thresholds: actions above 95% confidence proceed automatically, actions between 80-95% get a quick human review, and actions below 80% are fully escalated. This keeps the speed benefits of automation while maintaining guardrails where they matter most.
Projections show 35% of organisations deployed AI agents in 2025, with 86% expected by 2027. As adoption accelerates, HITL governance becomes the mechanism that makes autonomous agents trustworthy enough for production use in regulated industries like finance, healthcare, and insurance.
Agent observability emerges as critical infrastructure
With 79% of organisations now running AI agents, a new problem has emerged: most cannot trace failures through multi-step agent workflows. When a support agent gives a wrong answer, was it the retrieval step that returned bad context? The reasoning step that misinterpreted it? The tool call that failed silently? Without observability, debugging is guesswork.
Three platforms are leading the observability space in 2026. Braintrust is the only platform that integrates evaluation directly into agent observability, measuring performance with customisable metrics. Langfuse is the leading open-source option (MIT license) with LLM-as-a-judge evaluations, annotation queues, and prompt experiments. Maxim AI offers end-to-end simulation, evaluation, and observability, claiming 5x faster shipping for teams that adopt it.
The core capabilities these platforms provide are: trace collection across multi-step workflows, response quality evaluation (is the answer correct and grounded?), tool correctness verification (did the right tool get called with the right parameters?), cost tracking per interaction, and alerting on quality regressions.
For support teams, agent observability is what application performance monitoring (APM) was for web services a decade ago — you can run without it, but you can't run well.