Latency
Definition
At its core, latency is the delay between an input being sent and a response being received. In customer operations, it appears in multiple places: the time it takes an AI to generate a reply, the time a voice system takes to respond after a caller speaks, the time an API call takes to return data, and the time a message takes to deliver across a channel. Each of these delays contributes to how the interaction feels from the customer's perspective.
Example
A contact center deploys an AI voice agent. In testing, response latency averages just over two seconds from end of speech to start of reply. During live calls, callers interpret the pause as a system failure or dead air, and many repeat their question or ask to speak with a person. After the team reduces latency to under 800 milliseconds through infrastructure changes and prompt optimization, caller behavior normalizes and transfer rates drop. The same capability, with faster response, produces a meaningfully better experience.
Why It Matters
This shows up whenever speed is part of the perceived quality of service. In voice AI, latency breaks conversational rhythm. In chat, it signals unresponsiveness. In agent-facing tools, high latency creates frustration and slows handle time. For teams building AI-assisted workflows, latency is a design constraint that belongs alongside accuracy and cost in the evaluation of any system component.