Convert Text-to-Speech Easily Using Amazon Polly

Overview

Amazon Polly is a cloud service that turns text into lifelike speech, allowing us to create applications that talk and build entirely new categories of speech-enabled products. This service’s Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural-sounding human speech. With many lifelike voices across a broad set of languages, we can build speech-enabled applications that work in many various countries.

Along with the Standard TTS voices, Amazon Polly provides Neural Text-to-Speech (NTTS) voices that deliver advanced improvements in speech quality through a new machine-learning approach. Polly’s Neural Text-to-Speech technology is also supporting a Newscaster speaking style which is tailored to news narration use cases.

Amazon Polly provides an API that enables us to quickly integrate speech synthesis into the application. We simply send the text which we wanted to convert into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to our application so the app will begin streaming it directly or save it in the standard audio file format, such as MP3.

Features

Wide Selection of Voices and Languages
For Enhanced Visual Experience, Synchronize the Speech
Optimize Your Streaming Audio
Adjust Speech Rate, Pitch, Loudness, and Speaking Style
Newscaster Speaking Style
Adjust the Maximum Duration of Speech
Platform and Programming Language Support
Speech Synthesis through Console, Command Line, or API
Custom Lexicons
Brand Voice
Contact center integrations

Freedom Month Sale — Upgrade Your Skills, Save Big!

Up to 80% OFF AWS Courses
Up to 30% OFF Microsoft Certs

Act Fast!

Use cases

Telephony

With the help of Amazon Polly, our contact centers can engage customers with natural-sounding voices. We can cache and replay Amazon Polly’s speech output to prompt callers through interactive voice response systems, such as Amazon Connect. Moreover, we can leverage this service’s API to deliver automated real-time information such as service status, account and billing inquiries, addresses, contact information, and much more.

Example: Text-to-speech for telephony systems

E-learning

Polly’s service enables developers to provide their applications with an enhanced visual experience such as speech-synchronized facial animation word highlighting. Amazon Polly makes it easy to request an additional stream of metadata with information about when sentences, words, and sounds are being pronounced. Using this metadata stream alongside the synthesized speech audio stream, users can animate avatars and highlight text as it is currently spoken text in their application.

Example: Play the speech and highlight spoken text

Content creation

Audio is used as a complementary media to write and/or visual communication. By voicing the content, we can give the audience an alternative way to consume information and meet the needs of many readers. Polly can generate speech in dozens of languages, making it easy to add speech to apps with a global audience, such as RSS feeds, websites, or videos.

Example: Convert an article into speech and download it as MP3

Step-by-Step Demo on Amazon Polly

Step 1: Log in to Amazon’s account and select the service called Amazon Polly.

polly1

From the above image, as given, firstly select Engine as per requirement. Select language, and voice, and enter input text as well. Now click on listen. It will process and gives the speech.

polly2

You can also opt for SSML as shown above which is Speech Synthesis Markup Language tags allow you to modify speech output, for example by selecting changing the phonetic pronunciation of a word, a Newscaster’s voice, or adding a pause.

Step 2: Scroll down for Additional settings. Select Sample rate, File format, and pronunciation based on the requirement.

polly3

Step 3: Once it is ready click on download if it is required.

polly4

Step 4: Click on Save to S3 as shown below if it is required to store in S3.

polly5

Step 5: Now go to S3 service and create an S3 bucket to save it to S3 if it is required to store on it.

polly6

Step 6: Enter the bucket name which is created earlier as shown above and Click on Save to S3.

polly7

polly8

It will transfer the file to the S3 bucket as shown below

polly9

Step 7: Click on S3 synthesis tasks to see the tasks as shown below

polly10

Conclusion

Amazon Polly is an amazing service when you consider the challenge of breaking text into the speech elements appropriate for the required language, and then converting that speech elements into audio. Amazon Polly can also be used to create a synthesized speech for use in the Virtual Reality projects that are created using Amazon Sumerian, animation projects, real-time synthesis in applications, and more.

Freedom Month Sale — Discounts That Set You Free!

Up to 80% OFF AWS Courses
Up to 30% OFF Microsoft Certs

Act Fast!

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. Why should I use Amazon Polly?

ANS: – You can use Amazon Polly to enhance the app with high-quality spoken output. Amazon Polly is cost-effective, so it has very low response times, and it is available for virtually any use case, with no restrictions on saving and reusing generated speech.

2. Which audio formats are supported by Amazon Polly?

ANS: – Amazon Polly supports audio formats such as MP3, Vorbis, and raw PCM audio streams.

3. Does Amazon Polly participate in the AWS Free Tier?

ANS: – Yes, as part of the AWS Free Usage Tier, you can get started with Amazon Polly for free.

WRITTEN BY Suresh Kumar Reddy

Suresh is a highly skilled and results-driven Generative AI Engineer with over three years of experience and a proven track record in architecting, developing, and deploying end-to-end LLM-powered applications. His expertise covers the full project lifecycle, from foundational research and model fine-tuning to building scalable, production-grade RAG pipelines and enterprise-level GenAI platforms. Adept at leveraging state-of-the-art models, frameworks, and cloud technologies, Suresh specializes in creating innovative solutions to address complex business challenges.