Amazon Polly is a cloud service that turns text into lifelike speech, allowing us to create applications that talk and build entirely new categories of speech-enabled products. This service’s Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural-sounding human speech. With many lifelike voices across a broad set of languages, we can build speech-enabled applications that work in many various countries.
Along with the Standard TTS voices, Amazon Polly provides Neural Text-to-Speech (NTTS) voices that deliver advanced improvements in speech quality through a new machine-learning approach. Polly’s Neural Text-to-Speech technology is also supporting a Newscaster speaking style which is tailored to news narration use cases.
Amazon Polly provides an API that enables us to quickly integrate speech synthesis into the application. We simply send the text which we wanted to convert into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to our application so the app will begin streaming it directly or save it in the standard audio file format, such as MP3.
- Wide Selection of Voices and Languages
- For Enhanced Visual Experience, Synchronize the Speech
- Optimize Your Streaming Audio
- Adjust Speech Rate, Pitch, Loudness, and Speaking Style
- Newscaster Speaking Style
- Adjust the Maximum Duration of Speech
- Platform and Programming Language Support
- Speech Synthesis through Console, Command Line, or API
- Custom Lexicons
- Brand Voice
- Contact center integrations
With the help of Amazon Polly, our contact centers can engage customers with natural-sounding voices. We can cache and replay Amazon Polly’s speech output to prompt callers through interactive voice response systems, such as Amazon Connect. Moreover, we can leverage this service’s API to deliver automated real-time information such as service status, account and billing inquiries, addresses, contact information, and much more.
Example: Text-to-speech for telephony systems
Polly’s service enables developers to provide their applications with an enhanced visual experience such as speech-synchronized facial animation word highlighting. Amazon Polly makes it easy to request an additional stream of metadata with information about when sentences, words, and sounds are being pronounced. Using this metadata stream alongside the synthesized speech audio stream, users can animate avatars and highlight text as it is currently spoken text in their application.
Example: Play the speech and highlight spoken text
- Content creation
Audio is used as a complementary media to write and/or visual communication. By voicing the content, we can give the audience an alternative way to consume information and meet the needs of many readers. Polly can generate speech in dozens of languages, making it easy to add speech to apps with a global audience, such as RSS feeds, websites, or videos.
Example: Convert an article into speech and download it as MP3
- Cloud Migration
- AIML & IoT
Step-by-Step Demo on Amazon Polly
Step 1: Log in to Amazon’s account and select the service called Amazon Polly.
From the above image, as given, firstly select Engine as per requirement. Select language, and voice, and enter input text as well. Now click on listen. It will process and gives the speech.
You can also opt for SSML as shown above which is Speech Synthesis Markup Language tags allow you to modify speech output, for example by selecting changing the phonetic pronunciation of a word, a Newscaster’s voice, or adding a pause.
Step 2: Scroll down for Additional settings. Select Sample rate, File format, and pronunciation based on the requirement.
Step 3: Once it is ready click on download if it is required.
Step 4: Click on Save to S3 as shown below if it is required to store in S3.
Step 5: Now go to S3 service and create an S3 bucket to save it to S3 if it is required to store on it.
Step 6: Enter the bucket name which is created earlier as shown above and Click on Save to S3.
It will transfer the file to the S3 bucket as shown below
Step 7: Click on S3 synthesis tasks to see the tasks as shown below
Amazon Polly is an amazing service when you consider the challenge of breaking text into the speech elements appropriate for the required language, and then converting that speech elements into audio. Amazon Polly can also be used to create a synthesized speech for use in the Virtual Reality projects that are created using Amazon Sumerian, animation projects, real-time synthesis in applications, and more.
Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.
- Cloud Training
- Customized Training
- Experiential Learning
CloudThat is also the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
Drop a query if you have any questions regarding Amazon Polly and I will get back to you quickly.
1. Why should I use Amazon Polly?
ANS: – You can use Amazon Polly to enhance the app with high-quality spoken output. Amazon Polly is cost-effective, so it has very low response times, and it is available for virtually any use case, with no restrictions on saving and reusing generated speech.
2. Which audio formats are supported by Amazon Polly?
ANS: – Amazon Polly supports audio formats such as MP3, Vorbis, and raw PCM audio streams.
3. Does Amazon Polly participate in the AWS Free Tier?
ANS: – Yes, as part of the AWS Free Usage Tier, you can get started with Amazon Polly for free.
WRITTEN BY Suresh Kumar Reddy
Yerraballi Suresh Kumar Reddy is working as a Research Associate - Data and AI/ML at CloudThat. He is a self-motivated and hard-working Cloud Data Science aspirant who is adept at using analytical tools for analyzing and extracting meaningful insights from data.