AWS, Cloud Computing

4 Mins Read

Streaming Data to Amazon S3 Using the Amazon MSK Connect S3 Sink Connector

Voiced by Amazon Polly

Overview

Real-time data streaming is crucial for building scalable, responsive, and intelligent applications in today’s data-driven world. Apache Kafka, a powerful distributed streaming platform, enables organizations to process massive volumes of data in real time. However, integrating Kafka with systems such as Amazon S3 often requires considerable setup, custom coding, and infrastructure management.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Introduction

Amazon MSK Connect steps in, a fully managed feature of Amazon MSK that simplifies Kafka Connect operations. Amazon MSK Connect S3 Sink Connector enables seamless data streaming from Kafka topics to Amazon S3. It discusses its benefits and walks through a practical step-by-step implementation using the AWS Management Console.

Amazon MSK Connect

Amazon MSK Connect is a managed service that allows developers to run Kafka Connect clusters directly in AWS with minimal operational overhead. Kafka Connect is an open-source framework that simplifies moving large-scale data between Apache Kafka and other systems.

With Amazon MSK Connect, you can easily deploy source connectors to bring data into Kafka topics or sink connectors to push data from Kafka topics to external destinations like Amazon S3, Amazon OpenSearch Service, and databases.

Key Features of Amazon MSK Connect:

  • Managed Infrastructure: Automatically handles provisioning, patching, and scaling of Connect clusters.
  • Auto-Scaling: Supports horizontal and vertical scaling of connector tasks.
  • Resiliency: Automatically restarts failed tasks to maintain stream continuity.
  • VPC Connectivity: Integrates with AWS PrivateLink for secure, private data transfer.
  • Supports Open Source Plugins: You can use existing Kafka Connect-compatible plugins or build custom ones.

Why Use the Amazon S3 Sink Connector?

The Amazon S3 Sink Connector is a plugin that exports Kafka topic data to Amazon S3. This is especially valuable for use cases such as:

  • Long-term archival and backup
  • Offline analytics and reporting
  • Feeding data lakes or machine learning pipelines
  • Storing raw logs or structured event data

Amazon S3 is an ideal destination because of its scalability, cost-efficiency, and durability. With the Amazon S3 Sink Connector, you can write data from Kafka to Amazon S3 in JSON, Avro, or Parquet formats, organized using various partitioning strategies.

Step-by-Step: Sending Kafka Data to Amazon S3

Below is a practical guide to configuring MSK Connect with an Amazon S3 Sink Connector using the AWS Console.

Step 1: Set Up MSK and Required Resources

Create an Amazon MSK cluster if you haven’t already.

msk

Set up a client Amazon EC2 instance to send messages to a Kafka topic.

Prepare an Amazon S3 bucket to receive data exported by the connector.

Create an AWS IAM role (e.g., mkc-tutorial-role) with Amazon MSK Connect and S3 permissions.

Step 2: Upload the Amazon S3 Sink Connector as a Plugin

  • Download the Confluent S3 Sink Connector (usually a .zip file).
  • Upload the .zip to an accessible Amazon S3 bucket.
  • In the AWS Console, navigate to Amazon MSK Connect > Custom plugins.
  • Choose Create custom plugin.
  • Browse to your uploaded ZIP file in Amazon S3 and select it.
  • Name the plugin (e.g., mkc-tutorial-plugin) and complete creation.

Step 3: Create the Kafka Topic

Use your Amazon EC2 client machine to create a Kafka topic for the connector to consume:

Step 4: Create the Connector

  • Go to Amazon MSK Connect > Connectors and click Create connector.
  • Choose the plugin you just created.
  • Name the connector (e.g., mkc-tutorial-connector).
  • Select your Amazon MSK cluster.
  • Paste the following example configuration (update region and bucket name):

msk2

properties

connector.class=io.confluent.connect.s3.S3SinkConnector
s3.region=us-east-1
format.class=io.confluent.connect.s3.format.json.JsonFormat
flush.size=1
schema.compatibility=NONE
tasks.max=2
topics=mkc-tutorial-topic
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
storage.class=io.confluent.connect.s3.storage.S3Storage
s3.bucket.name=<your-s3-bucket-name>
topics.dir=tutorial

Assign the AWS IAM role mkc-tutorial-role.

Review and create the connector.

Partitioning Strategies for Amazon S3 Sink Connector

The way your data is organized in Amazon S3 can be customized using different partitioners:

  1. DefaultPartitioner

Stores each topic and partitions under a simple directory structure:

properties

partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
topics.dir=events-environment/development/frontend

Example Amazon S3 path:

s3://bucketname/events-environment/development/frontend/events/partition=0/

  1. TimeBasedPartitioner

Organizes data by timestamp, helpful for time-series data:

properties

partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner
path.format=YYYY/MM/dd/HH
timestamp.extractor=Record
partition.duration.ms=3600000

Example S3 path:

s3://bucketname/events-environment/development/frontend/2025/05/14/13/

Benefits of Using MSK Connect with Amazon S3 Sink

  • No Code Changes: Deploy existing Kafka Connect plugins as-is.
  • Managed Experience: Offloads operational complexity.
  • Elastic Scaling: Scales automatically based on throughput.
  • High Availability: Automatically recovers from task failures.
  • Security & Privacy: Uses private networking via AWS PrivateLink.

Conclusion

The Amazon MSK Connect S3 Sink Connector provides a powerful, managed solution for exporting Kafka data to Amazon S3 without the traditional operational complexity. It seamlessly integrates with your existing Kafka pipelines, offering data formatting and partitioning flexibility.

Whether building a real-time data lake, enabling historical analytics, or simply archiving event logs, the S3 Sink Connector offers a reliable and scalable path from streaming data to long-term storage.

By leveraging Amazon MSK Connect, you can focus more on your application’s core functionality and less on infrastructure management, a key win for modern data engineering teams.

Drop a query if you have any questions regarding Amazon MSK and we will get back to you quickly.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

  • Reduced infrastructure costs
  • Timely data-driven decisions
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery PartnerAWS ConfigAmazon EMR and many more.

FAQs

1. Can I use custom connectors with Amazon MSK Connect?

ANS: – Yes, Amazon MSK Connect allows you to upload and use custom connector plugins. You can package your connector as a ZIP file, upload it to Amazon S3, and register it as a custom plugin in the Amazon MSK Connect console.

2. What data formats are supported by the Amazon S3 Sink Connector?

ANS: – The connector supports multiple formats, including JSON, Avro, and Parquet. You can specify the format using the format.class configuration property.

WRITTEN BY Suresh Kumar Reddy

Yerraballi Suresh Kumar Reddy is working as a Research Associate - Data and AI/ML at CloudThat. He is a self-motivated and hard-working Cloud Data Science aspirant who is adept at using analytical tools for analyzing and extracting meaningful insights from data.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!