GCP Text-to-Speech APIs Calling from AWS Lambda (Python)

TABLE OF CONTENT

1. Introduction

2. Key Features

3. Prerequisites for Lambda Configuration

4. Step-by-Step Guide to Configure Lambda

5. Conclusion

6. About CloudThat

7. FAQs

1. Introduction

The Google Text-to-Speech application converts text into audio. Developers may use the Google Cloud Text-to-Voice API to integrate natural-sounding, synthetic human speech as playable audio in their apps. The Text-to-Speech API transforms text or Speech Synthesis Markup Language (SSML) input into MP3 or LINEAR16 audio data (the encoding used in WAV files). It also improves the accuracy of your transcription of specific words or phrases by customizing speech recognition to transcribe domain-specific terms and rare words by offering clues. For example, classes can transform spoken numbers into addresses, years, currencies, etc.

Custom resources can be created, managed, and experimented with using the Speech-to-Text UI. Use the API to deploy voice recognition in the Cloud, or use Speech-to-Text On-Prem to deploy speech recognition on-premises. Use Google’s most potent deep learning, neural network techniques (ASR) for automatic voice recognition.

2. Key Features

Adaptation of speech: Improve the accuracy of your transcription of specific words or phrases by customizing speech recognition to transcribe domain-specific terms and rare words by offering clues. Classes can transform spoken numbers into addresses, years, currencies, etc.
Domain-specific models: Choose from a variety of voice control and phone call and video transcription training models that are tuned for domain-specific quality needs. Our upgraded phone call model, for example, is adjusted for audio from telephony, such as calls recorded at an 8khz sample rate.
Easily compare quality: Experiment with your spoken audio using our simple user interface to compare quality. To improve quality and accuracy, experiment with alternative setups.
Speech-to-Text On-Prem: Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started.

3. Prerequisites for Lambda Configuration

Before configuring AWS Lambda, you need to create a service account role and service account keys in GCP to authenticate to the GCP API.
Create a role for Lambda having S3 write access
Install the Cloud Client Libraries for the Text-to-Speech API using the below command for setting up the layer on Lambda.
pip install –upgrade google-cloud-texttospeech
Create a layer for the above
Once the above steps are done now, you can create a lambda function to call the APIs through it

4. Step-by-Step Guide to Configure AWS Lambda

AWS Lambda is a serverless, event-driven computing solution that allows you to run code for almost any form of application or back-end service without providing or managing servers. Over 200 AWS services and Software-as-a-Service (SaaS) apps can trigger Lambda, and you only pay for what you use.

Step 1: Create a new lambda function with the below configurations

Step 2: Attach the lambda role which we have already created.

Step 3: Once the Lambda got created, add the GCP-texttospeech layer to import the Speech-to-text libraries

Step 4: Now, create a new JSON file in the same environment directory with the GCP service role credentials.

Step 5: You can now replace the default code provided by Lambda with the code

from google.cloud import texttospeech
import boto3
import os
import json

s3 = boto3.resource('s3')

def lambda_handler(event, context):
    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'Gcp_cred.json'
    client = texttospeech.TextToSpeechClient()
    text="Hi my name is alex"
    response = text_speech(text, client)
    s3.Object('<BucketName>', '<File Path>').put(Body=response.audio_content)
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

def text_speech(text, client):
    synthesis_input = texttospeech.SynthesisInput(text=text)
    voice = texttospeech.VoiceSelectionParams(
    language_code="bn-IN", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL, name="bn-IN-Wavenet-A"
    )
    audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
    )
    response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config
    )
    return response

from google.cloud import texttospeech

import boto3

import os

import json

s3 = boto3.resource('s3')

def lambda_handler(event, context):

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'Gcp_cred.json'

client = texttospeech.TextToSpeechClient()

text="Hi my name is alex"

response = text_speech(text, client)

s3.Object('<BucketName>', '<File Path>').put(Body=response.audio_content)

return {

'statusCode': 200,

'body': json.dumps('Hello from Lambda!')

}

def text_speech(text, client):

synthesis_input = texttospeech.SynthesisInput(text=text)

voice = texttospeech.VoiceSelectionParams(

language_code="bn-IN", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL, name="bn-IN-Wavenet-A"

)

audio_config = texttospeech.AudioConfig(

audio_encoding=texttospeech.AudioEncoding.MP3

)

response = client.synthesize_speech(

input=synthesis_input, voice=voice, audio_config=audio_config

)

return response

Note: Don't forget to mention your bucket name in line no: 13

1	Note: Don't forget to mention your bucket name in line no: 13

Now, check your S3 bucket an Audio file will be created which says <Hi my name is Alex> verbally.

To know more information about GCP’s texttospeech API, please click here:

5. Conclusion

GCP Text-to-Speech is a deep learning service that makes it simpler to interpret written languages by listening. We can also call/integrate APIs from anywhere by just authenticating with the service role and can improve customer relationships. Begin with no cost. With Dialog flow’s voice bots, you can provide a better speech experience for your customers. They are designed with developers in mind.

6. About CloudThat

CloudThat is on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere to advance in their businesses.

As a pioneer in the Cloud consulting realm, CloudThat is AWS (Amazon Web Services) Advanced Consulting Partner, AWS authorized Training Partner, Microsoft Gold Partner, and Winner of the Microsoft Asia Superstar Campaign for India: 2021.

To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat‘s offerings. Then, you can quickly get in touch with our highly accomplished team of experts to carry out your migration needs.

7. FAQs:

Does Google use the text or audio I send to the Speech-to-Text API?

Ans: Google does not utilize any of your material for any reason other than to provide you with the Speech-to-Text API service if you are not enrolled in the data logging opt-in program. Audio supplied to the API, for example, or any returned transcripts, are examples of content.

Does Google claim ownership of the content I send in the request to the Speech-to-Text API?

Ans: Google does not claim any ownership of any of the content (including the audio data and returned transcript) that you transmit to the Speech-to-Text API.

Can I resell the Speech-to-Text API?
No, you are not permitted to resell the Speech-to-Text API service. However, you can still integrate Speech-to-Text API into applications of independent value.