AWS, Cloud Computing

5 Mins Read

Real-time Data Streaming: Amazon RDS MySQL CDC with Apache Kafka on Amazon EC2, Debezium, and AWS Lambda – Part 2

Voiced by Amazon Polly

Overview

Debezium is an innovative open-source platform for Change Data Capture (CDC). It captures real-time data changes from various databases and transforms them into a stream of change events. Debezium supports popular databases like MySQL, PostgreSQL, and MongoDB and seamlessly integrates with Apache Kafka for efficient data streaming. With Debezium, organizations can unlock the power of real-time analytics, data integration, and event-driven architectures. Its resilience, low latency, and scalability make it an indispensable tool for capturing and leveraging database changes.

This entire solution is explained in 3 parts, in the 1st part, we have seen the creation of VPC and the installation of an Amazon EC2 machine (private) which can be SSH without using Bastion host that is with using Amazon EC2 Instance Connect (EIC) Endpoint, this is the 2nd part where we will be launching Amazon RDS, install Apache Kafka and configure Debezium on Private Amazon EC2.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Apache Kafka

Apache Kafka is an innovative open source distributed streaming platform.

It excels at handling high-volume, real-time data streams by enabling efficient and reliable data transportation between systems and applications. With its unique architecture and design principles, Apache Kafka offers scalability, fault tolerance, and the ability to retain large volumes of data.

It has become the standard choice for building scalable, real-time data pipelines and event-driven architectures. Debezium will be integrated with Apache Kafka for efficient data streaming.

Amazon RDS MySQL

Amazon RDS MySQL is a popular managed relational database service that Amazon Web Services (AWS) provides. It offers a simplified and scalable solution for deploying and managing MySQL databases in the cloud. Amazon RDS MySQL automates time-consuming administrative tasks such as hardware provisioning, software patching, and database backups. It provides high availability through automated backups, software patching, and replication options. Amazon RDS MySQL is known for its reliability, performance, and ease of use, making it an excellent choice for applications requiring a robust and scalable MySQL database solution. We will use Amazon RDS MySQL to connect with Debezium Connector.

Steps to Launch Amazon RDS MySQL

Step 1: Launch an Amazon RDS MySQL

Goto Amazon RDS service -> Parameter Groups -> Create – as following

rds1

Step 2: Once created, go into it, click on edit parameters

Step 3: Binlog_format – > Select ROW and click on save changes

rds3

Step 4: Now let’s create Amazon RDS MySQL Database

rds4

rds4b

rds4c

rds4d

Step 5: Select the VPC which is created earlier

rds5

rds5b

Step 6: Enter the Initial database name, select the DB Parameter logbin which was created, and enable backup for 1 day for binlog purposes, or else it wouldn’t work.

rds6

Keep other as default (if required, edit) and click on create (it would take a few mins to create)

Steps to Install Apache Kafka and configure Debezium on Private Amazon EC2

Step 1: Let’s install Apache Kafka and configure Debezium on Private Amazon EC2

In part 1, we have SSH into Amazon EC2 using Amazon EC2 Instance Connect Endpoint, now let’s run these commands (NAT Gateway helps us to install these with the web)

Step 2: Connect to MySQL

Password: enter your password which gave while creating
Show global variables like ‘log_bin’; show global variables like ‘binlog_format’;

kafka

kafka2

Keep this tab like that and duplicate this tab for next steps

Step 3: Debezium configuration

Step 4: Topic creation with Debenzium and listing

The above command doesn’t display any topics since we didn’t start the Debenzium connector

Launch duplicate tab of it and check if topics created

kafka3

We will do further process in part 3 where we will see CRUD operation on MySQL Database to check if Debezium is working.

Conclusion

In the above process, we have installed Apache Kafka on Amazon EC2, which has the Kafka topics to store the CDC data with the help of the Debezium connector. Multiple tables can be configured in the Debezium configuration file, separated by the databases as well. The topics will be created based on the tables specified in the configuration file. The CDC data will send to the respective topics.

Drop a query if you have any questions regarding Apache Kafka and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery PartnerAWS ConfigAmazon EMR and many more.

FAQs

1. What if I get a port 8083 error?

ANS: – Use lsof -i :8083 and kill the process and run it again.

2. Can we have more than one table?

ANS: – Yes, we can use multiple tables.

WRITTEN BY Suresh Kumar Reddy

Yerraballi Suresh Kumar Reddy is working as a Research Associate - Data and AI/ML at CloudThat. He is a self-motivated and hard-working Cloud Data Science aspirant who is adept at using analytical tools for analyzing and extracting meaningful insights from data.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!