AWS, Cloud Computing

5 Mins Read

Real-time Data Streaming: Amazon RDS MySQL CDC with Apache Kafka on Amazon EC2, Debezium, and AWS Lambda – Part 2


Debezium is an innovative open-source platform for Change Data Capture (CDC). It captures real-time data changes from various databases and transforms them into a stream of change events. Debezium supports popular databases like MySQL, PostgreSQL, and MongoDB and seamlessly integrates with Apache Kafka for efficient data streaming. With Debezium, organizations can unlock the power of real-time analytics, data integration, and event-driven architectures. Its resilience, low latency, and scalability make it an indispensable tool for capturing and leveraging database changes.

This entire solution is explained in 3 parts, in the 1st part, we have seen the creation of VPC and the installation of an Amazon EC2 machine (private) which can be SSH without using Bastion host that is with using Amazon EC2 Instance Connect (EIC) Endpoint, this is the 2nd part where we will be launching Amazon RDS, install Apache Kafka and configure Debezium on Private Amazon EC2.

Apache Kafka

Apache Kafka is an innovative open source distributed streaming platform.

It excels at handling high-volume, real-time data streams by enabling efficient and reliable data transportation between systems and applications. With its unique architecture and design principles, Apache Kafka offers scalability, fault tolerance, and the ability to retain large volumes of data.

It has become the standard choice for building scalable, real-time data pipelines and event-driven architectures. Debezium will be integrated with Apache Kafka for efficient data streaming.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Amazon RDS MySQL

Amazon RDS MySQL is a popular managed relational database service that Amazon Web Services (AWS) provides. It offers a simplified and scalable solution for deploying and managing MySQL databases in the cloud. Amazon RDS MySQL automates time-consuming administrative tasks such as hardware provisioning, software patching, and database backups. It provides high availability through automated backups, software patching, and replication options. Amazon RDS MySQL is known for its reliability, performance, and ease of use, making it an excellent choice for applications requiring a robust and scalable MySQL database solution. We will use Amazon RDS MySQL to connect with Debezium Connector.

Steps to Launch Amazon RDS MySQL

Step 1: Launch an Amazon RDS MySQL

Goto Amazon RDS service -> Parameter Groups -> Create – as following


Step 2: Once created, go into it, click on edit parameters

Step 3: Binlog_format – > Select ROW and click on save changes


Step 4: Now let’s create Amazon RDS MySQL Database





Step 5: Select the VPC which is created earlier



Step 6: Enter the Initial database name, select the DB Parameter logbin which was created, and enable backup for 1 day for binlog purposes, or else it wouldn’t work.


Keep other as default (if required, edit) and click on create (it would take a few mins to create)

Steps to Install Apache Kafka and configure Debezium on Private Amazon EC2

Step 1: Let’s install Apache Kafka and configure Debezium on Private Amazon EC2

In part 1, we have SSH into Amazon EC2 using Amazon EC2 Instance Connect Endpoint, now let’s run these commands (NAT Gateway helps us to install these with the web)

Step 2: Connect to MySQL

Password: enter your password which gave while creating
Show global variables like ‘log_bin’; show global variables like ‘binlog_format’;



Keep this tab like that and duplicate this tab for next steps

Step 3: Debezium configuration

Step 4: Topic creation with Debenzium and listing

The above command doesn’t display any topics since we didn’t start the Debenzium connector

Launch duplicate tab of it and check if topics created


We will do further process in part 3 where we will see CRUD operation on MySQL Database to check if Debezium is working.


In the above process, we have installed Apache Kafka on Amazon EC2, which has the Kafka topics to store the CDC data with the help of the Debezium connector. Multiple tables can be configured in the Debezium configuration file, separated by the databases as well. The topics will be created based on the tables specified in the configuration file. The CDC data will send to the respective topics.

Drop a query if you have any questions regarding Apache Kafka and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft Gold Partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

To get started, go through our Consultancy page and Managed Services Package that is CloudThat’s offerings.


1. What if I get a port 8083 error?

ANS: – Use lsof -i :8083 and kill the process and run it again.

2. Can we have more than one table?

ANS: – Yes, we can use multiple tables.

WRITTEN BY Suresh Kumar Reddy

Yerraballi Suresh Kumar Reddy is working as a Research Associate - Data and AI/ML at CloudThat. He is a self-motivated and hard-working Cloud Data Science aspirant who is adept at using analytical tools for analyzing and extracting meaningful insights from data.



    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!