AWS, Cloud Computing, Data Analytics

3 Mins Read

Seamlessly Import Data from Amazon S3 to Amazon RDS PostgreSQL Using aws_s3 Extension

Voiced by Amazon Polly

Introduction

Amazon S3 is the primary storage repository for raw, semi-structured, or structured data in modern cloud environments. Meanwhile, Amazon RDS for PostgreSQL is a reliable, fully managed relational database system for transactional and analytical operations. Moving data from Amazon S3 to Amazon RDS PostgreSQL is routine for data engineers, especially for use cases like log ingestion, ETL pipeline outputs, or report generation.

AWS provides the aws_s3 extension in RDS PostgreSQL to simplify this data movement, enabling direct data import from Amazon S3 into a database table using a SQL command. This blog will walk you through the setup, usage, and best practices for leveraging the aws_s3.table_import_from_s3 function.

rds

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Prerequisites

  • Before beginning the import process, ensure the following conditions are met:
  • Amazon RDS for PostgreSQL: An instance running PostgreSQL version 11 or higher.
  • Amazon S3 Bucket: The source file (CSV format preferred) must exist in a bucket in the same AWS region as your RDS instance.
  • AWS IAM Role: Create an AWS IAM role with the AmazonS3ReadOnlyAccess policy to allow access to the Amazon S3 bucket.
  • Role Association: Attach the AWS IAM role to your Amazon RDS instance for the S3_IMPORT
  • Extension Enabled: Enable the aws_s3 extension on your PostgreSQL database.

Creating and Associating an AWS IAM Role

For Amazon RDS to read from your Amazon S3 bucket, you will need an AWS IAM role configured with appropriate permissions.

Step 1: Create a new AWS IAM role with the following trust policy:

Step 2: Assign the AmazonS3ReadOnlyAccess policy to the created AWS IAM role.

Step 3: Use the AWS CLI or Console to link the AWS IAM role to your Amazon RDS instance:

This grants your Amazon RDS instance permission to fetch files from the Amazon S3 bucket.

Enabling the aws_s3 Extension

Connect to your Amazon RDS PostgreSQL instance using a SQL client and run the following command to enable the extension:

This command enables the necessary functions for importing data.

Creating the Destination Table

Before importing, create a table that matches the schema of your source data. For example:

Important: The order of columns and their data types should match the CSV file exactly to avoid loading errors.

Importing Data Using aws_s3.table_import_from_s3

Here’s how to perform the import:

Parameters Breakdown:

  • ‘users’: The destination table.
  • ”: Empty string implies all columns will be mapped automatically.
  • ‘(format csv, header true)’: Specifies CSV file format with a header row.
  • create_s3_uri(…): Constructs the URI to the Amazon S3 object.

Best Practices & Tips

Follow these tips to ensure smooth imports:

  • Use Staging Tables: Import into temporary tables first for data validation before moving to production tables.
  • Maintain Format Consistency: Ensure your CSV files are consistently formatted with matching headers and delimiters.
  • Split Large Files: Divide huge datasets into smaller chunks to simplify troubleshooting.
  • Schedule Wisely: Run imports during non-peak hours to minimize locking and performance issues.
  • Enable Query Logging: Monitor slow queries and errors by enabling detailed PostgreSQL logs.

Real-World Use Case Example

Suppose you work at a logistics firm where daily shipment logs are saved in an Amazon S3 bucket. Each CSV contains shipment IDs, locations, and timestamps. You can set up a nightly import job into your Amazon RDS PostgreSQL reporting database like this:

This approach minimizes manual effort, saves transformation tool costs, and delivers timely data insights.

Performance Considerations

  • When dealing with large datasets, keep these performance tricks in mind:
  • Disable Indexes: Drop or disable indexes during imports and recreate them afterward to speed up loading.
  • Parallel Loads: Use multiple temporary tables to load parts of your data in parallel.
  • CloudWatch Monitoring: Monitor CPU, memory, and disk metrics to catch bottlenecks early.
  • Data Validation: Always validate your imported data to catch any anomalies early.

When to Use vs. Other AWS Services

Use aws_s3.table_import_from_s3 when:

  • Your source data is already in S3.
  • You don’t need a complex transformation.
  • You prefer a lightweight, serverless solution.

Use AWS Glue or AWS DMS when:

  • You need advanced transformations.
  • You’re syncing data between heterogeneous databases.
  • Your import involves large-scale orchestration.

Conclusion

The aws_s3.table_import_from_s3 function provides a lightweight, serverless option to load data from Amazon S3 into Amazon RDS PostgreSQL with minimal overhead. It significantly simplifies the data ingestion process, especially when dealing with reporting and analytical use cases that require timely data availability.

With basic setup, enabling the extension, configuring AWS IAM roles, and preparing your data, you can automate and streamline data transfer without needing separate ETL pipelines.

Drop a query if you have any questions regarding Amazon RDS PostgreSQL and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery Partner and many more.

FAQs

1. Can I use aws_s3.table_import_from_s3 with file formats other than CSV?

ANS: – Currently, the function primarily supports the CSV format. If you have JSON, Parquet, or other formats, convert them into CSV before using this function, or consider using AWS Glue for more flexible ETL operations.

2. Is it possible to automate the import process on a schedule?

ANS: – Yes, you can automate the import using AWS Lambda and Amazon EventBridge (Amazon CloudWatch Events). Trigger AWS Lambda function that connects to your Amazon RDS PostgreSQL and executes the import SQL statement at scheduled intervals (e.g., daily or hourly).

WRITTEN BY Sunil H G

Sunil H G is a highly skilled and motivated Research Associate at CloudThat. He is an expert in working with popular data analysis and visualization libraries such as Pandas, Numpy, Matplotlib, and Seaborn. He has a strong background in data science and can effectively communicate complex data insights to both technical and non-technical audiences. Sunil's dedication to continuous learning, problem-solving skills, and passion for data-driven solutions make him a valuable asset to any team.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!