AI/ML, AWS, Cloud Computing

6 Mins Read

Customize Language Translation with Machine Learning Tool: Amazon Translate- Part 2

TABLE OF CONTENT

1. Overview
2. Translating the SRT Files in Different Languages
3. Setting up a Trigger on S3
4. Translating the SRT and Storing the files in S3
5. Lambda Code
6. Conclusion
7. About CloudThat
8. FAQs

 

Overview

Streaming video or audio content is a very effective way to share information, entertain, and engage users. Every organization these days has an extensive collection of videos or audio with captions and subtitles. Translated captions and subtitles can be provided in multiple languages ​​to make these videos or audio available to more viewers. This blog will check how to use Amazon Translate to create an automated flow that translates captions and subtitles without losing context.

Captions and subtitles give people with hearing impairment access to the video or audio, provide flexibility for users in noisy and quiet environments, and help support non-native speakers. Captions or subtitles are usually rendered in SRT (.srt) or WebVTT (.vtt) format. SRT stands for SubRipSubtitle and is the most common file format for subtitles and captions. WebVTT stands for Web Video Text Track and is becoming a popular format for the same purpose. In this blog, we will check on Translating the SRT files into different languages.

Translating the SRT Files in Different Languages

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable and customized language translation. Neural machine translation is a form of automated language translation that uses machine learning models to deliver more accurate and natural sound translations than standard rule-based translation algorithms.

With Amazon Translate, you can create local content such as websites and apps for various users, easily translate significant texts for analysis, and effectively enable interaction between users.

This article will translate the data stored in a text file into different Languages. We will use S3 triggers that will make it possible to automate translation from start to end. Below is a detailed overview of what we will accomplish in this article.

After translation, we create the SRT files using the translated delimited file by adding the timestamp.

Amazon Translate

Setting up a Trigger on S3

Click on the ‘Add Trigger’ option on the lambda, select ‘S3’ as a source, and select the Event Type as ‘PUT.’ The prefix is the folder & suffix is the file type. We are considering only .srt files for the demo, and our Lambda will be triggered when the file is uploaded to the “input” folder.

Amazon Translate

Translating the SRT and Storing the files in S3

Lambda Code

This code gets invoked from the S3 Event and fetches the file data. Then we call the functions from the “srtCaptions” file, which helps remove the timestamp from the file and convert it into normal text for translation. Then we Translate the text as per our requirement and again add the time stamp to the Translated text.

Code in srtCaptions.py file

This file contains the code which will remove the timestamp from the SRT file, and will convert it into text that we can use for Translation. After Translation, again we will add the timestamp to the translated text and store it in S3.

Conclusion

When we upload a text file in our S3 bucket, our Lambda will be triggered, and after execution of our Lambda, we will be able to see SRT files in our S3 bucket Output Folder containing the translated SRT files. This SRT file can be used per the business requirements for further processing, depending on the use case.

Refer to ‘Translate Text to Different Languages using Amazon Translate- Part 1’ for more information about Amazon Translate.

About CloudThat

CloudThat is the official AWS Advanced Consulting Partner, Microsoft Gold Partner, and Training partner helping people develop knowledge on the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

If you have any queries about Amazon SageMaker, Natural Language Processing, Hugging Face, or anything related to AWS services, feel free to drop in a comment. We will get back to you quickly. Visit our Consulting Page for more updates on our customer offerings, expertise, and cloud services.

FAQs

  1. What are the different inputs which Amazon Translate supports?

Ans. Amazon Translate supports plain text input in UTF-8 format.

  1. What are the size limits on the Translate API? 

Ans. Amazon Translate API calls are limited to 5,000 bytes per API call. Amazon Translate, an asynchronous Batch Translation service, accepts a batch of up to 5 GB in size per API call 

  1. Does Amazon Translate provide automatic source language detection?

Ans. Amazon Translate automatically detects source language using Amazon Comprehend behind the scenes if the source language is unknown.

  1. Are requests where the source language and the target language are the same charged?

Ans. No, Requests are not charged if the source language equals the target language.

WRITTEN BY Sanket Gaikwad

SHARE

Comments

  1. Pratiksha

    Jul 28, 2022

    Reply

    Informative!!

  2. Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!