graph TD
Input --> Speech-to-speech --> Output
Input --> Speech-to-text--> LLM-or-Agentic-Workflow --> Text-to-speech --> Output
Latency of Human Conversations : Humans except the latency of conversations to be around 236ms. With the best architecture practices we can achieve latency of 540ms.
| Task | Latency(in ms) |
|---|---|
| VAD | 20 |
| EOU | 100 |
| ASR/STT | 100-500 |
| LLM | 220-500 |
| TTS | 100-450 |

Voice Agent Architecture
WebRTC stands for Web Real-Time Communication. It is free open-source project providing web browsers and mobile applications with real-time communication via APIs.