|
Voiced by Amazon Polly |
Overview
The blog discusses the idea of picking up the phone and chatting with an AI assistant in a natural way, as if you were speaking to a person who understands what you say and can speak just like you, without those irritating “press 1 for sales” prompts. This blog aims to build exactly such an experience by combining Exotel’s cloud telephony services with Amazon Nova Sonic.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Introduction
Voice has always been the most natural form of communication; however, most automated telephone systems aren’t exactly natural. The conventional technique combines three distinct services: Speech-to-Text (STT), a Large Language Model (LLM), and Text-to-Speech (TTS). With each hop, latency increases, leading to awkward pauses between what the user says and the automated response.
Exote is a cloud-based telephony platform that provides programmable voice APIs and call routing features. Most importantly, it allows streaming audio over WebSockets with its Voice Bot Applet, making it a great candidate for building voice bots on top of Amazon Nova Sonic.
Combining those two platforms with a minimalistic Python server yields a true AI voice agent capable of engaging in meaningful phone conversations.
Architecture Flow
The Architecture flow is simple. A FastAPI server sits between Exotel and Amazon Bedrock, serving as an audio bridge and an orchestration tool.

The incoming call is forwarded by Exotel through the FastAPI server using WebSockets. Then the raw PCM audio is sent to Amazon Nova Sonic’s bidirectional streaming API via Amazon Bedrock. In this API, the speaker’s speech is recognized and, depending on the case, tools such as the knowledge base or the web can be activated. This is followed by providing spoken responses to the FastAPI server via this API.
How a Call Flows Through the System?
- Call received: Exotel handles the incoming call and triggers the server at the endpoint /incoming-call through the HTTP protocol. The server then establishes an Amazon Nova Sonic session using Amazon Bedrock and returns a WebSocket URL for Exotel to connect.
- Start audio streaming: Exotel establishes the WebSocket connection and starts streaming raw PCM audio frames (16 bits per sample, mono channel at 8kHz). The server passes them through directly to the bi-directional stream in Amazon Nova Sonic.
- Speech processing with Amazon Nova Sonic: The model analyzes the speech, recognizes that the user is done talking (turn detection), understands the intent behind the speech, and speaks its answer back in the same session. There are no separate calls made to STT and TTS services.
- Calls to tools: In case the speech prompts for some external information, Amazon Nova Sonic raises a tool use event. The server invokes the relevant tool (e.g., querying an Amazon Bedrock Knowledge Base or calling a search API) and pushes the result back through the stream.
- Call ended: When the caller ends the call or disconnects from the WebSocket, the server saves the log transcript to Amazon DynamoDB and closes the session.
Key Implementation Details
- Pre-warming Sessions to Minimize Initial Response Delay
It takes 1-2 seconds to establish two-way communication with Amazon Bedrock. Instead of keeping the caller on hold, the server initializes the Amazon Nova Sonic session when sending the first HTTP request, before establishing the WebSocket connection. When the audio starts streaming, the model is already pre-warmed and listening for commands.
- Audio Format Compatibility Without Conversion Costs
Another fortunate circumstance: Exotel’s Voice Bot Applet and Amazon Nova Sonic utilize precisely the same audio encoding scheme, 16-bit signed little-endian PCM with 8kHz mono sampling rate. The audio conversion module becomes a mere pass-through process that requires no computation.
- Silence Detection and Follow-up Prompt
An idle detection routine tracks the duration of the caller’s silence. Once the predefined silence threshold is exceeded, the server prompts Nova Sonic with a textual message and waits for the response from the language model.
- Preventing Unnecessary Pauses While Interacting with Tools
Interactions with knowledge bases, web search engines, etc., may take several seconds. To avoid an awkward pause, the server injects a fill-in phrase into Amazon Nova Sonic.
Conclusion
It is possible to create AI voice agents capable of engaging in realistic telephone conversations by utilizing Exotel along with Amazon Nova Sonic. Exotel manages the telephony layer, including calls and audio streams. At the same time, Amazon’s Nova Sonic handles the cognitive aspects of speech recognition, reasoning, and conversational response. In between sits a lightweight FastAPI server that manages audio connections and invokes other tools. Any business considering automating telephone interactions can benefit from this setup.
Drop a query if you have any questions regarding Amazon Nova Sonic and we will get back to you quickly.
Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.
- Reduced infrastructure costs
- Timely data-driven decisions
About CloudThat
FAQs
1. What is Amazon Nova Sonic, and how does it differ from conventional voice AI?
ANS: – Amazon Nova Sonic is a speech-to-speech foundation model offered by Amazon Bedrock. Conventional voice AI pipelines include Speech-to-Text, LLM, and Text-to-Speech APIs chained together. Amazon Nova Sonic uses a unidirectional pipe for speech-in/speech-out instead of the three APIs, lowering latency and simplifying the architecture.
2. What makes Exotel suitable for this integration?
ANS: – Exotel offers a Voice Bot Applet that streams raw PCM audio over WebSockets in real time. This is precisely what is required to integrate Amazon Nova Sonic into a conversational interface. Exotel manages call routing, number management, and telecommunications compliance so that you can focus on the AI portion of the application.
WRITTEN BY Nekkanti Bindu
Nekkanti Bindu works as a Research Associate at CloudThat, where she channels her passion for cloud computing into meaningful work every day. Fascinated by the endless possibilities of the cloud, Bindu has established herself as an AWS consultant, helping organizations harness the full potential of AWS technologies. A firm believer in continuous learning, she stays at the forefront of industry trends and evolving cloud innovations. With a strong commitment to making a lasting impact, Bindu is driven to empower businesses to thrive in a cloud-first world.
Login

May 19, 2026
PREV
Comments