Voiced by Amazon Polly |
Introduction
Amazon S3 is the primary storage repository for raw, semi-structured, or structured data in modern cloud environments. Meanwhile, Amazon RDS for PostgreSQL is a reliable, fully managed relational database system for transactional and analytical operations. Moving data from Amazon S3 to Amazon RDS PostgreSQL is routine for data engineers, especially for use cases like log ingestion, ETL pipeline outputs, or report generation.
AWS provides the aws_s3 extension in RDS PostgreSQL to simplify this data movement, enabling direct data import from Amazon S3 into a database table using a SQL command. This blog will walk you through the setup, usage, and best practices for leveraging the aws_s3.table_import_from_s3 function.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Prerequisites
- Before beginning the import process, ensure the following conditions are met:
- Amazon RDS for PostgreSQL: An instance running PostgreSQL version 11 or higher.
- Amazon S3 Bucket: The source file (CSV format preferred) must exist in a bucket in the same AWS region as your RDS instance.
- AWS IAM Role: Create an AWS IAM role with the AmazonS3ReadOnlyAccess policy to allow access to the Amazon S3 bucket.
- Role Association: Attach the AWS IAM role to your Amazon RDS instance for the S3_IMPORT
- Extension Enabled: Enable the aws_s3 extension on your PostgreSQL database.
Creating and Associating an AWS IAM Role
For Amazon RDS to read from your Amazon S3 bucket, you will need an AWS IAM role configured with appropriate permissions.
Step 1: Create a new AWS IAM role with the following trust policy:
1 2 3 4 5 6 7 8 9 10 11 12 |
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } |
Step 2: Assign the AmazonS3ReadOnlyAccess policy to the created AWS IAM role.
Step 3: Use the AWS CLI or Console to link the AWS IAM role to your Amazon RDS instance:
1 2 3 4 |
aws rds add-role-to-db-instance \ --db-instance-identifier <your-db-instance> \ --role-arn arn:aws:iam::<account-id>:role/<role-name> \ --feature-name S3_IMPORT |
This grants your Amazon RDS instance permission to fetch files from the Amazon S3 bucket.
Enabling the aws_s3 Extension
Connect to your Amazon RDS PostgreSQL instance using a SQL client and run the following command to enable the extension:
1 |
CREATE EXTENSION IF NOT EXISTS aws_s3; |
This command enables the necessary functions for importing data.
Creating the Destination Table
Before importing, create a table that matches the schema of your source data. For example:
1 2 3 4 5 |
CREATE TABLE users ( id INT, name TEXT, email TEXT ); |
Important: The order of columns and their data types should match the CSV file exactly to avoid loading errors.
Importing Data Using aws_s3.table_import_from_s3
Here’s how to perform the import:
1 2 3 4 5 6 |
SELECT aws_s3.table_import_from_s3( 'users', '', '(format csv, header true)', aws_commons.create_s3_uri('your-bucket-name', 'path/to/file.csv', 'us-west-2') ); |
Parameters Breakdown:
- ‘users’: The destination table.
- ”: Empty string implies all columns will be mapped automatically.
- ‘(format csv, header true)’: Specifies CSV file format with a header row.
- create_s3_uri(…): Constructs the URI to the Amazon S3 object.
Best Practices & Tips
Follow these tips to ensure smooth imports:
- Use Staging Tables: Import into temporary tables first for data validation before moving to production tables.
- Maintain Format Consistency: Ensure your CSV files are consistently formatted with matching headers and delimiters.
- Split Large Files: Divide huge datasets into smaller chunks to simplify troubleshooting.
- Schedule Wisely: Run imports during non-peak hours to minimize locking and performance issues.
- Enable Query Logging: Monitor slow queries and errors by enabling detailed PostgreSQL logs.
Real-World Use Case Example
Suppose you work at a logistics firm where daily shipment logs are saved in an Amazon S3 bucket. Each CSV contains shipment IDs, locations, and timestamps. You can set up a nightly import job into your Amazon RDS PostgreSQL reporting database like this:
1 2 3 4 5 6 |
SELECT aws_s3.table_import_from_s3( 'shipments', '', '(format csv, header true)', aws_commons.create_s3_uri('my-logistics-bucket', 'daily_logs/2024-04-22.csv', 'us-east-1') ); |
This approach minimizes manual effort, saves transformation tool costs, and delivers timely data insights.
Performance Considerations
- When dealing with large datasets, keep these performance tricks in mind:
- Disable Indexes: Drop or disable indexes during imports and recreate them afterward to speed up loading.
- Parallel Loads: Use multiple temporary tables to load parts of your data in parallel.
- CloudWatch Monitoring: Monitor CPU, memory, and disk metrics to catch bottlenecks early.
- Data Validation: Always validate your imported data to catch any anomalies early.
When to Use vs. Other AWS Services
Use aws_s3.table_import_from_s3 when:
- Your source data is already in S3.
- You don’t need a complex transformation.
- You prefer a lightweight, serverless solution.
Use AWS Glue or AWS DMS when:
- You need advanced transformations.
- You’re syncing data between heterogeneous databases.
- Your import involves large-scale orchestration.
Conclusion
With basic setup, enabling the extension, configuring AWS IAM roles, and preparing your data, you can automate and streamline data transfer without needing separate ETL pipelines.
Drop a query if you have any questions regarding Amazon RDS PostgreSQL and we will get back to you quickly.
Making IT Networks Enterprise-ready – Cloud Management Services
- Accelerated cloud migration
- End-to-end view of the cloud environment
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront Service Delivery Partner, Amazon OpenSearch Service Delivery Partner, AWS DMS Service Delivery Partner, AWS Systems Manager Service Delivery Partner, Amazon RDS Service Delivery Partner, AWS CloudFormation Service Delivery Partner and many more.
FAQs
1. Can I use aws_s3.table_import_from_s3 with file formats other than CSV?
ANS: – Currently, the function primarily supports the CSV format. If you have JSON, Parquet, or other formats, convert them into CSV before using this function, or consider using AWS Glue for more flexible ETL operations.
2. Is it possible to automate the import process on a schedule?
ANS: – Yes, you can automate the import using AWS Lambda and Amazon EventBridge (Amazon CloudWatch Events). Trigger AWS Lambda function that connects to your Amazon RDS PostgreSQL and executes the import SQL statement at scheduled intervals (e.g., daily or hourly).

WRITTEN BY Sunil H G
Sunil H G is a highly skilled and motivated Research Associate at CloudThat. He is an expert in working with popular data analysis and visualization libraries such as Pandas, Numpy, Matplotlib, and Seaborn. He has a strong background in data science and can effectively communicate complex data insights to both technical and non-technical audiences. Sunil's dedication to continuous learning, problem-solving skills, and passion for data-driven solutions make him a valuable asset to any team.
Comments