AI/ML, AWS, Cloud Computing, Data Analytics

4 Mins Read

Seamless Multi-Channel Audio Streaming to Amazon Transcribe via Web Audio API

Voiced by Amazon Polly

Overview

For real-time transcription in a web browser, this blog shows how to stream dual-channel audio from two microphones to Amazon Transcribe. Developers can use the Web Audio API to scalable and reasonably pricedly combine multiple audio inputs into a stereo stream, encode it appropriately, and use Amazon Transcribe’s multi-channel capabilities to differentiate between speakers.

Integrates two microphones and sends stereo audio to Amazon Transcribe for real-time transcription using the Web Audio API. It explains how to use AudioContext and AudioWorklet for low-latency, high-fidelity processing, lists and accesses multiple input devices, and routes each microphone to a specific audio channel (left or right). After PCM encoding, the audio is streamed to Amazon Transcribe via WebSocket and transcribed using channel-level speaker separation.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Introduction

With browser technologies like the Web Audio API, real-time transcription is no longer limited to enterprise-grade desktop tools. Amazon Transcribe can now stream and transcribe multi-channel audio from a web application. Applications where you want to accurately distinguish between speakers, such as call centers, interviews, or group meetings, will find this feature especially helpful. This blog post will explain how to use two microphones in a browser-based Vue.js application, combine their audio into stereo, encode it in PCM, and send it straight to Amazon Transcribe.

Use of Multiple Channels for Transcription

With Amazon Transcribe’s multi-channel transcription, developers can precisely identify the source by sending two distinct audio sources over the left and right stereo channels.

This approach provides the following advantages over speaker labeling on a single mixed channel:

  • Improved speaker separation.
  • Decreased the possibility of mislabeling, particularly when speaking in similar tones.
  • Better management of simultaneous speakers’ overlapping speech.

When using separate microphones mapped to known channels, it is also no longer necessary to guess speaker identities using arbitrary labels like “spk_0” or “spk_1”.

Key Challenges

It’s important to note a few difficulties with this configuration before moving forward with implementation:

  • Voice overlap: Both voices may be audible when microphones are placed in close proximity to one another.
  • Hardware management: Accurately identify, gain access to, and coordinate several microphones.
  • Latency: Processing and streaming in real-time must occur with the least amount of latency possible.

These problems can be lessened with directional microphones, appropriate gain controls, and channel-specific processing.

Technical Synopsis

We’ll use Amazon Transcribe Streaming for transcription and the Web Audio API to combine audio streams to construct this solution. This is a high-level summary:

  • The browser can identify and access two microphones.
  • Use ChannelMergerNode to combine them into stereo.
  • Audio should be encoded in 16-bit PCM format.
  • Using WebSocket, stream the PCM chunks to Amazon Transcribe.
  • After parsing the results, map them to the relevant channels.

Let’s examine each of the above actions in more detail.

  • Finding Devices

Use the browser’s media-device APIs to locate all connected audio inputs. From this list, select the two microphones you want to stream. This ensures you know exactly which physical equipment will power your transcription pipeline.

  • Acquisition of Streams

Ask the user for an audio stream for each microphone they have chosen. To maximize audio clarity before any additional processing, you also activate built-in browser features like automatic gain control, noise reduction, and echo cancellation at this point.

  • Channel Assignment and Merging

Conceptually, place one microphone on the “left” channel and the other on the “right” channel. Because each single-channel input is internally routed into a virtual stereo output, downstream components do not perceive them as a mixed mono feed but as distinct spatial streams.

  • Processing Audio in Real Time

Transfer the combined stereo stream to a low-latency audio processor, which is often accomplished using an Audio Worklet. This specialized processor continuously packages raw audio data into chunks for encoding while operating in its thread.

  • Encoding of PCM

Transform each floating-point audio sample into the 16-bit Pulse-Code Modulation (PCM) format that Amazon Transcribe needs. In theory, this entails packing each sample into fixed-size buffers, interleaving left and right channel samples, and scaling each sample to the 16-bit integer range.

  • Streaming to Amazon Transcribe with WebSocket

Use Amazon Transcribe’s streaming API to establish a WebSocket-based transcription session. Configure the session to identify and anticipate two channels independently. As each PCM buffer is created, send it over the open connection in real time.

  • Parsing Results and Channel Mapping

Each utterance is assigned a channel identifier (such as “channel 0” or “channel 1”) as transcription results are received. To obtain two parallel transcripts precisely aligned with each speaker, your application merely routes those text segments back to the original microphone source.

AD

Real-World Applications

This browser-based configuration is very flexible:

  • Call centers: Record conversations between two different lines in a single stream.
  • Tools for podcasting: Speaker separation and real-time transcription.
  • Consumer Input Kiosks: Record several voices automatically.

Additionally, using a single session for two speakers rather than two drastically lowers AWS costs.

Conclusion

Streaming dual-channel audio from a browser to Amazon Transcribe via the Web Audio API offers a powerful and flexible real-time speech recognition solution. You can create accurate, low-latency, cost-effective transcripts with the right hardware and encoding. The approach outlined in this blog can offer a strong foundation for developing advanced audio transcription tools right within the browser without the need for native apps or plugins.

Drop a query if you have any questions regarding Amazon Transcribe and we will get back to you quickly.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

  • Reduced infrastructure costs
  • Timely data-driven decisions
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery PartnerAWS ConfigAmazon EMR and many more.

FAQs

1. Is it possible to use more than two microphones with this configuration?

ANS: – Yes, in theory, but Amazon Transcribe allows up to two live streaming channels. You must use separate transcription sessions for each pair or pre-process the audio externally if you require more.

2. Which browsers are compatible?

ANS: – Google Chrome 135+ is used to test this method of implementation. While other Chromium-based browsers might function, Firefox and Safari’s support for multi-streams and AudioWorklets is less dependable.

WRITTEN BY Balaji M

Balaji works as a Research Intern at CloudThat, specializing in cloud technologies and AI-driven solutions. He is passionate about leveraging advanced technologies to solve complex problems and drive innovation.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!