Audio-To-Text Automated Conversion Using AWS Transcribe

1. Overview

Audio-to-text is the process of converting audio to textual format. For computer software and programs, audio files are near impossible to be used for analysis and to get the essential data out of it in a meaningful way. Therefore, there is a need to convert these audio files to text before they can be used for analysis.

Currently, there are many tools created by software providers who have created their models and algorithms to provide this speech-to-text as a service. In this blog, we will go through one such service provided by AWS for speech to text named AWS Transcribe.

Freedom Month Sale — Upgrade Your Skills, Save Big!

Up to 80% OFF AWS Courses
Up to 30% OFF Microsoft Certs

Act Fast!

2. Introduction to AWS Transcribe

Amazon Transcribe is a fully managed and continuously trained automatic speech recognition service that automatically generates time-stamped text transcripts from audio files. Amazon Transcribe makes it easy for developers to add speech-to-text capabilities to their applications. Audio data is virtually impossible for computers to search and analyze. Therefore, recorded speech needs to be converted to text before being used in applications. Historically, customers had to work with transcription providers that required them to sign expensive contracts and were hard to integrate into their technology stacks to accomplish this task. Many of these providers use outdated technology that does not adapt well to different scenarios, like low-fidelity phone audio standards in contact centers, which results in poor accuracy.

3. AWS Transcribe

We will make use of S3 triggers that will make it possible to automate transcribing from start to end. Below is a detailed overview of what we will accomplish in this article.

Create a Lambda Role having access to the S3, Cloud Watch, and AWS Transcribe service
Create an S3 bucket and an output bucket for AWS Transcribe.
Create a Lambda function using python as a runtime to trigger AWS Transcribe whenever a new .mp3 file is uploaded to the input S3 bucket.

4. Setting up a Trigger on S3

Click on the ‘Add Trigger’ option on the lambda, select ‘S3’ as a source, and select the Event Type as ‘PUT.’ Prefix means the folder & suffix means the file type. We are considering only .mp3 files for the demo.

5. Lambda Code for Transcribing the Text and Storing the text file in S3

Firstly, we will import the required libraries like boto3, requests, and JSON
Increase the Lambda timeout from Configuration Settings; it is set to 3 secs by default.
This code reads the Event and fetches the Bucket Name and File Name from the Event.
Then we create an S3 URL which we are supposed to give for Transcribe Job
We start the Transcription job and then get the details of the Transcription job
For starting and getting the Transcription details, we are calling a function
We fetch the Transcript File Url and other details which are needed from the JSON response
To fetch the Transcribed data from the Url, we use requests and fetch the desired data.
Then we make a text file and upload that text file to S3
After execution of the code, we see a Text file in S3, and also, there will be a Transcription Job created in the AWS Transcribe service

import boto3
import json
import requests
s3  = boto3.client('s3')
transcribe = boto3.client('transcribe')

def lambda_handler(event, context):
    try:
        file_bucket = event['Records'][0]['s3']['bucket']['name']
        file_name = event['Records'][0]['s3']['object']['key']
        object_url = 'https://s3.amazonaws.com/{0}/{1}'.format(file_bucket, file_name)
        transcriptionJobDetails=startTranscriptionJob(file_name,object_url)
        status = getTranscriptionJob(file_name)
        url=status['TranscriptionJob']['Transcript']['TranscriptFileUri']
        Text_Data = (requests.get(url).json())['results']['transcripts'][0]['transcript']
        file = open(f"/tmp/{file_name}.txt", "w") 
        file.write(Text_Data) 
        file.close() 
        s3.upload_file(
                        Filename = f"/tmp/{file_name}.txt" , 
                        Bucket = "test-bucket-transcribe" , 
                        Key = f"{file_name}.txt"
                        )
        return Text_Data 
    except Exception as e:
        raise e

import boto3

import json

import requests

s3 = boto3.client('s3')

transcribe = boto3.client('transcribe')

def lambda_handler(event, context):

try:

file_bucket = event['Records'][0]['s3']['bucket']['name']

file_name = event['Records'][0]['s3']['object']['key']

object_url = 'https://s3.amazonaws.com/{0}/{1}'.format(file_bucket, file_name)

transcriptionJobDetails=startTranscriptionJob(file_name,object_url)

status = getTranscriptionJob(file_name)

url=status['TranscriptionJob']['Transcript']['TranscriptFileUri']

Text_Data = (requests.get(url).json())['results']['transcripts'][0]['transcript']

file = open(f"/tmp/{file_name}.txt", "w")

file.write(Text_Data)

file.close()

s3.upload_file(

Filename = f"/tmp/{file_name}.txt" ,

Bucket = "test-bucket-transcribe" ,

Key = f"{file_name}.txt"

)

return Text_Data

except Exception as e:

raise e

This function is used to Start the Transcription Job, It will call the Transcribe API and we are passing parameters to IdentifyLanguage Automatically of the Audio File.

def startTranscriptionJob(file_name,object_url):
    response = transcribe.start_transcription_job(
        TranscriptionJobName=file_name.replace('/','')[:10],
        IdentifyLanguage= True,
        MediaFormat='mp3',
        Media={
            'MediaFileUri': object_url
        })
    return response

def startTranscriptionJob(file_name,object_url):

response = transcribe.start_transcription_job(

TranscriptionJobName=file_name.replace('/','')[:10],

IdentifyLanguage= True,

MediaFormat='mp3',

Media={

'MediaFileUri': object_url

})

return response

This function is used to Get the Transcription Job Details; this will return a JSON response, from which we will fetch the desired results

def getTranscriptionJob(file_name):
    while True:
        status = transcribe.get_transcription_job(
                TranscriptionJobName=file_name.replace('/','')[:10]
                )
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
    return status

def getTranscriptionJob(file_name):

while True:

status = transcribe.get_transcription_job(

TranscriptionJobName=file_name.replace('/','')[:10]

)

if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:

break

return status

6. Architecture Diagram

7. Conclusion

Now, if we upload an Mp3 file in our S3 bucket, our lambda will be triggered and after execution of our lambda, we will be able to see a text file in our S3 bucket containing the text Transcribed from the Audio File. This text can be used as per the business requirements for further processing and analysis. Also, this Transcribed text can be used for translation into different languages using the AWS Translate service.

8. UseCases:

Get insights from customer conversations

With Amazon Transcribe, we can quickly gather insights from the conversations. Further, AWS Contact Center Intelligence partners and Contact Lens for Amazon Connect offer the best solution to improve customer engagement and increase agent productivity.

Search and analyze media content

Content producers and media distributors can use Amazon Transcribe to automatically convert audio and video assets into a fully searchable archive for content, visual output, content rating, and monetization.

Create subtitles and meeting notes

It helps to write down your wanted and stream content to increase reach and improve customer experience. Use Amazon Transcribe to improve productivity and accurately record meetings and discussions that are important to you.

Improve clinical documentation

Physicians and clinicians can use Amazon Transcribe Medical to quickly and efficiently record clinical interviews on electronic health records (EHR) for analysis. HIPAA service – is qualified and trained to understand medical terms.

Freedom Month Sale — Discounts That Set You Free!

Up to 80% OFF AWS Courses
Up to 30% OFF Microsoft Certs

Act Fast!

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. Does Amazon Transcribe support real-time transcriptions?

ANS: – Yes, Amazon Transcribe enables users to open a bidirectional stream over HTTP2. users can send an audio movement to the service while receiving textual content move-in go back in real-time.

2. Are there size restrictions on the audio content that Amazon Transcribe can process?

ANS: – Amazon Transcribe provider calls are constrained to four hours (or 2GB) in keeping with API for batch service. The streaming service can accommodate open connections as much as four hours long.

3. What languages can Amazon Transcribe automatically identify?

ANS: – Amazon Transcribe can identify any of the languages supported by the batch and streaming APIs.

4. Does Amazon Transcribe identify multiple languages in the same audio file?

ANS: – Amazon Transcribe only identifies the dominant language in an audio file.