Voiced by Amazon Polly |
1. Overview
Audio-to-text is the process of converting audio to textual format. For computer software and programs, audio files are near impossible to be used for analysis and to get the essential data out of it in a meaningful way. Therefore, there is a need to convert these audio files to text before they can be used for analysis.
Currently, there are many tools created by software providers who have created their models and algorithms to provide this speech-to-text as a service. In this blog, we will go through one such service provided by AWS for speech to text named AWS Transcribe.
Customized Cloud Solutions to Drive your Business Success
- Cloud Migration
- Devops
- AIML & IoT
2. Introduction to AWS Transcribe
Amazon Transcribe is a fully managed and continuously trained automatic speech recognition service that automatically generates time-stamped text transcripts from audio files. Amazon Transcribe makes it easy for developers to add speech-to-text capabilities to their applications. Audio data is virtually impossible for computers to search and analyze. Therefore, recorded speech needs to be converted to text before being used in applications. Historically, customers had to work with transcription providers that required them to sign expensive contracts and were hard to integrate into their technology stacks to accomplish this task. Many of these providers use outdated technology that does not adapt well to different scenarios, like low-fidelity phone audio standards in contact centers, which results in poor accuracy.
3. AWS Transcribe
We will make use of S3 triggers that will make it possible to automate transcribing from start to end. Below is a detailed overview of what we will accomplish in this article.
- Create a Lambda Role having access to the S3, Cloud Watch, and AWS Transcribe service
- Create an S3 bucket and an output bucket for AWS Transcribe.
- Create a Lambda function using python as a runtime to trigger AWS Transcribe whenever a new .mp3 file is uploaded to the input S3 bucket.
4. Setting up a Trigger on S3
Click on the ‘Add Trigger’ option on the lambda, select ‘S3’ as a source, and select the Event Type as ‘PUT.’ Prefix means the folder & suffix means the file type. We are considering only .mp3 files for the demo.
5. Lambda Code for Transcribing the Text and Storing the text file in S3
- Firstly, we will import the required libraries like boto3, requests, and JSON
- Increase the Lambda timeout from Configuration Settings; it is set to 3 secs by default.
- This code reads the Event and fetches the Bucket Name and File Name from the Event.
- Then we create an S3 URL which we are supposed to give for Transcribe Job
- We start the Transcription job and then get the details of the Transcription job
- For starting and getting the Transcription details, we are calling a function
- We fetch the Transcript File Url and other details which are needed from the JSON response
- To fetch the Transcribed data from the Url, we use requests and fetch the desired data.
- Then we make a text file and upload that text file to S3
- After execution of the code, we see a Text file in S3, and also, there will be a Transcription Job created in the AWS Transcribe service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import boto3 import json import requests s3 = boto3.client('s3') transcribe = boto3.client('transcribe') def lambda_handler(event, context): try: file_bucket = event['Records'][0]['s3']['bucket']['name'] file_name = event['Records'][0]['s3']['object']['key'] object_url = 'https://s3.amazonaws.com/{0}/{1}'.format(file_bucket, file_name) transcriptionJobDetails=startTranscriptionJob(file_name,object_url) status = getTranscriptionJob(file_name) url=status['TranscriptionJob']['Transcript']['TranscriptFileUri'] Text_Data = (requests.get(url).json())['results']['transcripts'][0]['transcript'] file = open(f"/tmp/{file_name}.txt", "w") file.write(Text_Data) file.close() s3.upload_file( Filename = f"/tmp/{file_name}.txt" , Bucket = "test-bucket-transcribe" , Key = f"{file_name}.txt" ) return Text_Data except Exception as e: raise e |
This function is used to Start the Transcription Job, It will call the Transcribe API and we are passing parameters to IdentifyLanguage Automatically of the Audio File.
1 2 3 4 5 6 7 8 9 |
def startTranscriptionJob(file_name,object_url): response = transcribe.start_transcription_job( TranscriptionJobName=file_name.replace('/','')[:10], IdentifyLanguage= True, MediaFormat='mp3', Media={ 'MediaFileUri': object_url }) return response |
This function is used to Get the Transcription Job Details; this will return a JSON response, from which we will fetch the desired results
1 2 3 4 5 6 7 8 |
def getTranscriptionJob(file_name): while True: status = transcribe.get_transcription_job( TranscriptionJobName=file_name.replace('/','')[:10] ) if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']: break return status |
7. Conclusion
Now, if we upload an Mp3 file in our S3 bucket, our lambda will be triggered and after execution of our lambda, we will be able to see a text file in our S3 bucket containing the text Transcribed from the Audio File. This text can be used as per the business requirements for further processing and analysis. Also, this Transcribed text can be used for translation into different languages using the AWS Translate service.
8. UseCases:
- Get insights from customer conversations
With Amazon Transcribe, we can quickly gather insights from the conversations. Further, AWS Contact Center Intelligence partners and Contact Lens for Amazon Connect offer the best solution to improve customer engagement and increase agent productivity.
- Search and analyze media content
Content producers and media distributors can use Amazon Transcribe to automatically convert audio and video assets into a fully searchable archive for content, visual output, content rating, and monetization.
- Create subtitles and meeting notes
It helps to write down your wanted and stream content to increase reach and improve customer experience. Use Amazon Transcribe to improve productivity and accurately record meetings and discussions that are important to you.
- Improve clinical documentation
Physicians and clinicians can use Amazon Transcribe Medical to quickly and efficiently record clinical interviews on electronic health records (EHR) for analysis. HIPAA service – is qualified and trained to understand medical terms.
Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.
- Cloud Training
- Customized Training
- Experiential Learning
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront Service Delivery Partner, Amazon OpenSearch Service Delivery Partner, AWS DMS Service Delivery Partner, AWS Systems Manager Service Delivery Partner, Amazon RDS Service Delivery Partner, AWS CloudFormation Service Delivery Partner, AWS Config, Amazon EMR and many more.
FAQs
1. Does Amazon Transcribe support real-time transcriptions?
ANS: – Yes, Amazon Transcribe enables users to open a bidirectional stream over HTTP2. users can send an audio movement to the service while receiving textual content move-in go back in real-time.
2. Are there size restrictions on the audio content that Amazon Transcribe can process?
ANS: – Amazon Transcribe provider calls are constrained to four hours (or 2GB) in keeping with API for batch service. The streaming service can accommodate open connections as much as four hours long.
3. What languages can Amazon Transcribe automatically identify?
ANS: – Amazon Transcribe can identify any of the languages supported by the batch and streaming APIs.
4. Does Amazon Transcribe identify multiple languages in the same audio file?
ANS: – Amazon Transcribe only identifies the dominant language in an audio file.

WRITTEN BY Sanket Gaikwad
Comments