Streamlining Analytics with Amazon's Zero-ETL Integration for Amazon DynamoDB and Amazon Redshift

Introduction

Amazon recently announced the general availability (GA) of its zero-ETL integration between Amazon DynamoDB and Amazon Redshift. This integration allows users to run analytics on Amazon DynamoDB data within Amazon Redshift without building and maintaining complex data pipelines. With zero-ETL (Extract, Transform, Load), data written into Amazon DynamoDB table is automatically available in Amazon Redshift, facilitating analytics with minimal impact on Amazon DynamoDB’s performance.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

Zero-ETL Integration

Zero-ETL integration transfers data directly from one system to another without requiring the traditional ETL process. In the case of Amazon DynamoDB and Amazon Redshift, this integration automates the movement of data from Amazon DynamoDB tables to Amazon Redshift for analytics.

It supports high-performance SQL queries, machine learning, data sharing, and cross-database joins. Zero-ETL simplifies the ETL pipelines, making analytics more efficient and less prone to operational issues.

Benefits of Zero-ETL Integration

This integration enables seamless data replication from Amazon DynamoDB to Amazon Redshift, eliminating the need for manual data pipelines and incremental data updates every 15-30 minutes. It facilitates point-to-point data movement without affecting Amazon DynamoDB performance. The initial data transfer is a full load, while subsequent changes are captured incrementally. Multiple Amazon DynamoDB tables can be integrated into a single Redshift cluster or serverless workgroup, providing a unified view of data from various sources.

How It Works?

Data replication happens with little to no performance impact on Amazon DynamoDB, and no additional read capacity units are consumed. As the integration is fully managed, users can continue using Amazon DynamoDB for operational workloads while the data is simultaneously replicated to Amazon Redshift for analytics. This integration supports managing configurations via the AWS CLI, SDKs, APIs, or Management Console.

Prerequisites for Setting Up the Integration

Before setting up zero-ETL integration, certain prerequisites must be met:

Enable Point-in-Time Recovery (PITR): The source Amazon DynamoDB table needs PITR enabled for data consistency and backups.
Enable Case Sensitivity for Amazon Redshift: The target Amazon Redshift database must enable case sensitivity.
Configure AWS IAM Policies: Attach necessary resource-based policies for both Amazon DynamoDB and Amazon Redshift, ensuring proper permissions for data replication.

Creating the Integration

The integration can be created via either the Amazon DynamoDB or Amazon Redshift console. Steps involve:

Selecting a Source Table: Choose the Amazon DynamoDB table for replication. Each table requires a separate integration.
Configuring Amazon Redshift as the Target: Select the target Amazon Redshift data warehouse, which can be in the same or a different AWS account.
Handling Prerequisite Configurations Automatically: The console provides options to enable PITR or update resource policies if they are not already configured.

Data Structure in Amazon Redshift

Once the integration is active, a new database is created in Amazon Redshift, where a table is replicated under the default schema. The replicated table follows Amazon DynamoDB’s structure with columns for partition key, sort key, and a SUPER column that contains all other attributes in Amazon DynamoDB JSON format. The partition key serves as the distribution key, and the combination of partition and sort keys is used for sorting in Redshift. Users can change the sort key settings as needed.

Querying and Validating Data

Data can be queried in Amazon Redshift using SQL, and incremental updates can be verified in near real-time. The SUPER data type allows working with semi-structured data, making it possible to extract specific attributes using Amazon Redshift’s PartiQL SQL support. Incremental updates, such as inserting, deleting, or modifying items in Amazon DynamoDB, are automatically reflected in Amazon Redshift.

Materialized Views for Analytics

For analytics, materialized views can be created on the replicated tables. These views provide optimized data access by automatically refreshing with changes in the underlying data, thus reducing query execution times. They are particularly useful for dashboards and reports that require frequent data aggregation or transformation.

Monitoring and Metrics

Users can monitor the integration’s performance through the Amazon Redshift console or Amazon CloudWatch. Available metrics include data transfer rates, lag times, and table statistics. System views such as SVV_INTEGRATION, and SYS_INTEGRATION_ACTIVITY provide detailed insights into the integration’s configuration and performance.

Pricing Considerations

There are no additional charges specifically for the zero-ETL integration. However, costs associated with Amazon DynamoDB PITR, data exports, Amazon Redshift storage, and compute resources still apply.

Cleaning Up

Users can delete the zero-ETL integration from the Amazon Redshift console to stop data replication. This action stops future data transfers but does not remove existing data from Amazon DynamoDB or Amazon Redshift.

Conclusion

The zero-ETL integration simplifies data analytics by automating data transfer from Amazon DynamoDB to Amazon Redshift, eliminating traditional ETL complexities. This streamlined approach allows organizations to gain insights across multiple applications and reduce operational overhead while improving cost efficiency.

Drop a query if you have any questions regarding Amazon DynamoDB, Amazon Redshift or Zero-ETL and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

Accelerated cloud migration
End-to-end view of the cloud environment

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. How does the Zero-ETL integration benefit data engineers and analysts?

ANS: – Zero-ETL integration saves time and effort by automating the data replication process between Amazon DynamoDB and Amazon Redshift. It allows data engineers to focus on building analytics solutions rather than managing complex ETL workflows. It provides timely access to the most current data for data analysts, enabling more accurate and real-time analysis.

2. Can Zero-ETL integration handle large-scale data replication?

ANS: – Yes, Zero-ETL integration is designed to handle large-scale data replication. It supports automatic scaling to manage high volumes of data and frequent updates, ensuring that even large Amazon DynamoDB tables can be efficiently synchronized with Amazon Redshift.

WRITTEN BY Rachana Kampli

Rachana Kampli works as an AWS Data Engineer at CloudThat with expertise in designing and building scalable data pipeline solutions. She is skilled in a broad range of AWS services, including Amazon S3, AWS Glue, Amazon Redshift, AWS Lambda, Amazon Kinesis, AWS DMS, and Amazon QuickSight. With a strong foundation in data engineering principles, Rachana focuses on developing efficient, reliable, and cost-effective data processing and analytics solutions. In her free time, she keeps up with the latest advancements in cloud and data technologies and enjoys exploring new tools and frameworks in the data ecosystem.