Voiced by Amazon Polly |
Overview
Real-time data streaming is crucial for building scalable, responsive, and intelligent applications in today’s data-driven world. Apache Kafka, a powerful distributed streaming platform, enables organizations to process massive volumes of data in real time. However, integrating Kafka with systems such as Amazon S3 often requires considerable setup, custom coding, and infrastructure management.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Introduction
Amazon MSK Connect steps in, a fully managed feature of Amazon MSK that simplifies Kafka Connect operations. Amazon MSK Connect S3 Sink Connector enables seamless data streaming from Kafka topics to Amazon S3. It discusses its benefits and walks through a practical step-by-step implementation using the AWS Management Console.
Amazon MSK Connect
Amazon MSK Connect is a managed service that allows developers to run Kafka Connect clusters directly in AWS with minimal operational overhead. Kafka Connect is an open-source framework that simplifies moving large-scale data between Apache Kafka and other systems.
With Amazon MSK Connect, you can easily deploy source connectors to bring data into Kafka topics or sink connectors to push data from Kafka topics to external destinations like Amazon S3, Amazon OpenSearch Service, and databases.
Key Features of Amazon MSK Connect:
- Managed Infrastructure: Automatically handles provisioning, patching, and scaling of Connect clusters.
- Auto-Scaling: Supports horizontal and vertical scaling of connector tasks.
- Resiliency: Automatically restarts failed tasks to maintain stream continuity.
- VPC Connectivity: Integrates with AWS PrivateLink for secure, private data transfer.
- Supports Open Source Plugins: You can use existing Kafka Connect-compatible plugins or build custom ones.
Why Use the Amazon S3 Sink Connector?
The Amazon S3 Sink Connector is a plugin that exports Kafka topic data to Amazon S3. This is especially valuable for use cases such as:
- Long-term archival and backup
- Offline analytics and reporting
- Feeding data lakes or machine learning pipelines
- Storing raw logs or structured event data
Amazon S3 is an ideal destination because of its scalability, cost-efficiency, and durability. With the Amazon S3 Sink Connector, you can write data from Kafka to Amazon S3 in JSON, Avro, or Parquet formats, organized using various partitioning strategies.
Step-by-Step: Sending Kafka Data to Amazon S3
Below is a practical guide to configuring MSK Connect with an Amazon S3 Sink Connector using the AWS Console.
Step 1: Set Up MSK and Required Resources
Create an Amazon MSK cluster if you haven’t already.
Set up a client Amazon EC2 instance to send messages to a Kafka topic.
Prepare an Amazon S3 bucket to receive data exported by the connector.
Create an AWS IAM role (e.g., mkc-tutorial-role) with Amazon MSK Connect and S3 permissions.
Step 2: Upload the Amazon S3 Sink Connector as a Plugin
- Download the Confluent S3 Sink Connector (usually a .zip file).
- Upload the .zip to an accessible Amazon S3 bucket.
- In the AWS Console, navigate to Amazon MSK Connect > Custom plugins.
- Choose Create custom plugin.
- Browse to your uploaded ZIP file in Amazon S3 and select it.
- Name the plugin (e.g., mkc-tutorial-plugin) and complete creation.
Step 3: Create the Kafka Topic
Use your Amazon EC2 client machine to create a Kafka topic for the connector to consume:
1 2 3 |
bash kafka-topics.sh --create --topic mkc-tutorial-topic --bootstrap-server <broker-endpoint> --partitions 1 --replication-factor 1 |
Step 4: Create the Connector
- Go to Amazon MSK Connect > Connectors and click Create connector.
- Choose the plugin you just created.
- Name the connector (e.g., mkc-tutorial-connector).
- Select your Amazon MSK cluster.
- Paste the following example configuration (update region and bucket name):
properties
connector.class=io.confluent.connect.s3.S3SinkConnector
s3.region=us-east-1
format.class=io.confluent.connect.s3.format.json.JsonFormat
flush.size=1
schema.compatibility=NONE
tasks.max=2
topics=mkc-tutorial-topic
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
storage.class=io.confluent.connect.s3.storage.S3Storage
s3.bucket.name=<your-s3-bucket-name>
topics.dir=tutorial
Assign the AWS IAM role mkc-tutorial-role.
Review and create the connector.
Partitioning Strategies for Amazon S3 Sink Connector
The way your data is organized in Amazon S3 can be customized using different partitioners:
- DefaultPartitioner
Stores each topic and partitions under a simple directory structure:
properties
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
topics.dir=events-environment/development/frontend
Example Amazon S3 path:
s3://bucketname/events-environment/development/frontend/events/partition=0/
- TimeBasedPartitioner
Organizes data by timestamp, helpful for time-series data:
properties
partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner
path.format=YYYY/MM/dd/HH
timestamp.extractor=Record
partition.duration.ms=3600000
Example S3 path:
s3://bucketname/events-environment/development/frontend/2025/05/14/13/
Benefits of Using MSK Connect with Amazon S3 Sink
- No Code Changes: Deploy existing Kafka Connect plugins as-is.
- Managed Experience: Offloads operational complexity.
- Elastic Scaling: Scales automatically based on throughput.
- High Availability: Automatically recovers from task failures.
- Security & Privacy: Uses private networking via AWS PrivateLink.
Conclusion
Whether building a real-time data lake, enabling historical analytics, or simply archiving event logs, the S3 Sink Connector offers a reliable and scalable path from streaming data to long-term storage.
By leveraging Amazon MSK Connect, you can focus more on your application’s core functionality and less on infrastructure management, a key win for modern data engineering teams.
Drop a query if you have any questions regarding Amazon MSK and we will get back to you quickly.
Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.
- Reduced infrastructure costs
- Timely data-driven decisions
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront Service Delivery Partner, Amazon OpenSearch Service Delivery Partner, AWS DMS Service Delivery Partner, AWS Systems Manager Service Delivery Partner, Amazon RDS Service Delivery Partner, AWS CloudFormation Service Delivery Partner, AWS Config, Amazon EMR and many more.
FAQs
1. Can I use custom connectors with Amazon MSK Connect?
ANS: – Yes, Amazon MSK Connect allows you to upload and use custom connector plugins. You can package your connector as a ZIP file, upload it to Amazon S3, and register it as a custom plugin in the Amazon MSK Connect console.
2. What data formats are supported by the Amazon S3 Sink Connector?
ANS: – The connector supports multiple formats, including JSON, Avro, and Parquet. You can specify the format using the format.class configuration property.

WRITTEN BY Suresh Kumar Reddy
Yerraballi Suresh Kumar Reddy is working as a Research Associate - Data and AI/ML at CloudThat. He is a self-motivated and hard-working Cloud Data Science aspirant who is adept at using analytical tools for analyzing and extracting meaningful insights from data.
Comments