Exporting Data from Amazon RDS PostgreSQL to Amazon S3 Using aws_s3.query_export_to

Introduction

As cloud architectures evolve, efficiently moving data from transactional databases to scalable storage like Amazon S3 becomes increasingly critical. Amazon RDS for PostgreSQL offers a built-in extension called aws_s3, which enables direct exports of query results to Amazon S3 through a secure, server-side operation.

One of the standout features provided by this extension is the aws_s3.query_export_to_s3 function greatly simplifies and accelerates data transfers without the need for intermediate storage or complex scripts.

In this blog post, we will walk you through the setup, permissions, usage, and best practices for using aws_s3.query_export_to_s3 to export data from Amazon RDS PostgreSQL to Amazon S3.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

Why use aws_s3.query_export_to_s3?

This function provides several key advantages:

Direct export to Amazon S3: Eliminates the need first to store data locally.
Performance optimization: Leverages the Amazon RDS engine’s capabilities for efficient data transfer.
Simplified workflows: Reduces the need for external scripts, manual exports, or tools like pg_dump.

Prerequisites

Before proceeding, ensure you meet the following requirements:

Running Amazon RDS for PostgreSQL version 5 or later.
The aws_s3 extension must be enabled on your database.
An AWS IAM role is assigned to your Amazon RDS instance with permissions to write to the target Amazon S3 bucket.
An Amazon S3 bucket where the exported data will be stored.
(Recommended) A VPC endpoint for Amazon S3 to enable secure, fast, private transfers.

rds

Step-by-Step Guide

Step 1: Create an AWS IAM Role

You will first need an AWS IAM role that Amazon RDS can assume to access your Amazon S3 bucket.

Go to the AWS IAM Console and create a new role.
Select AWS service -> Amazon RDS as the trusted entity.
Attach a policy like the one below:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:AbortMultipartUpload",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"s3:PutObject",

"s3:AbortMultipartUpload",

"s3:ListBucket",

"s3:GetBucketLocation"

"Resource": [

"arn:aws:s3:::your-bucket-name",

"arn:aws:s3:::your-bucket-name/*"

]

}

]

}

4. After creating the role, attach it to your Amazon RDS instance using the AWS CLI:

aws rds add-role-to-db-instance \
  --db-instance-identifier your-db-instance-id \
  --role-arn arn:aws:iam::<your-account-id>:role/<your-role-name> \
  --feature-name S3_EXPORT

aws rds add-role-to-db-instance \

--db-instance-identifier your-db-instance-id \

--role-arn arn:aws:iam::<your-account-id>:role/<your-role-name> \

--feature-name S3_EXPORT

Step 2: Enable the aws_s3 Extension

Connect to your Amazon RDS PostgreSQL instance and run:

CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;

1	CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;

This ensures the necessary functionality is available.

Step 3: Export Query Results to Amazon S3

Use the aws_s3.query_export_to_s3 function to export your data directly to Amazon S3:

SELECT aws_s3.query_export_to_s3(
  'SELECT * FROM your_table WHERE created_at >= current_date - interval ''7 days''',
  aws_commons.create_s3_uri('your-bucket-name', 'exports/data_export.csv', 'us-west-2'),
  options := 'format csv, HEADER'
);

SELECT aws_s3.query_export_to_s3(

'SELECT * FROM your_table WHERE created_at >= current_date - interval ''7 days''',

aws_commons.create_s3_uri('your-bucket-name', 'exports/data_export.csv', 'us-west-2'),

options := 'format csv, HEADER'

);

Parameters:

query: The SQL query whose result you want to export.
create_s3_uri: Helps create the target Amazon S3 URI using bucket name, object key, and region.
options: Additional options like format csv, HEADER, and DELIMITER for formatting.

Example Use Case

Suppose you want to export the last seven days of order data from an e-commerce platform:

SELECT aws_s3.query_export_to_s3(
  'SELECT order_id, customer_id, total_amount, order_date FROM orders WHERE order_date >= now() - interval ''7 days''',
  aws_commons.create_s3_uri('ecommerce-data-bucket', 'weekly_exports/orders_weekly.csv', 'us-east-1'),
  options := 'format csv, HEADER'
);

SELECT aws_s3.query_export_to_s3(

'SELECT order_id, customer_id, total_amount, order_date FROM orders WHERE order_date >= now() - interval ''7 days''',

aws_commons.create_s3_uri('ecommerce-data-bucket', 'weekly_exports/orders_weekly.csv', 'us-east-1'),

options := 'format csv, HEADER'

);

Once complete, the CSV file will appear in the specified Amazon S3 bucket, ready for use with Amazon Athena, Amazon Redshift Spectrum, or other analytics tools.

Security Tips

Restrict bucket access with precise bucket policies.
Enable encryption using SSE-S3 or SSE-KMS to protect your data at rest.
Monitor access by enabling Amazon S3 server access logs and auditing activities through AWS CloudTrail.

Best Practices for Implementation

Partitioned Exports: Use query filters (e.g., WHERE timestamp >= now() – interval ‘1 day’) to export incremental data and avoid full-table exports.
File Format Optimization: Export to compressed or columnar formats like Parquet using external transformation jobs post-export.
Amazon S3 Bucket Lifecycle Rules: Set up retention policies to automatically archive or delete older export files.
Monitoring and Alerts: Log export failures and metrics via PostgreSQL logs or integrate with Amazon CloudWatch and AWS CloudTrail for visibility.

Use Cases in Production

Daily/Hourly Data Archival: Automate exports of transactional data to S3 for audit trails, compliance, or long-term storage.
Data Lake Integration: Move OLTP data to Amazon S3 for downstream analytics using Amazon Athena, Amazon Redshift Spectrum, or Amazon EMR.
Backup for Analytics: Export query-based snapshots of critical tables (e.g., user logs, financial transactions) to Amazon S3 for backup or analysis.
Hybrid Workflows: Combine on-demand exports with AWS Glue or Amazon SageMaker pipelines for machine learning and reporting.

Conclusion

The aws_s3.query_export_to_s3 function provides a highly efficient and straightforward method for exporting data from Amazon RDS PostgreSQL to Amazon S3. Eliminating the need for intermediate storage or tools ensures faster, safer, and more cost-effective data transfers.

For automation, you can schedule exports using the pg_cron extension or build workflows with AWS Lambda, AWS Step Functions, or Amazon EventBridge, creating a seamless bridge between your OLTP databases and cloud data lakes.

Drop a query if you have any questions regarding Amazon RDS PostgreSQL or Amazon S3 and we will get back to you quickly.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

Reduced infrastructure costs
Timely data-driven decisions

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. Can I export very large datasets with aws_s3.query_export_to_s3?

ANS: – Yes, but for extremely large datasets, exporting data in smaller chunks (such as by date range) is advisable to avoid timeout errors or memory issues.

2. Can I export to formats other than CSV?

ANS: – Currently, the function only supports the CSV format. You can adjust delimiters, headers, and other CSV-specific options via the options parameter.

WRITTEN BY Sunil H G

Sunil H G is a highly skilled and motivated Research Associate at CloudThat. He is an expert in working with popular data analysis and visualization libraries such as Pandas, Numpy, Matplotlib, and Seaborn. He has a strong background in data science and can effectively communicate complex data insights to both technical and non-technical audiences. Sunil's dedication to continuous learning, problem-solving skills, and passion for data-driven solutions make him a valuable asset to any team.