Voiced by Amazon Polly |
Amazon Web Services (AWS) has announced Amazon S3 Tables, a new storage solution designed specifically for analytics workloads. Built to handle tabular data efficiently, such as daily transactions, sensor data, and ad impressions, S3 Tables offer superior query performance, higher transactions per second, and reduced storage costs—all while providing the operational ease of a fully managed service.
Transform Your Career with AWS Certifications
- Advanced Skills
- AWS Official Curriculum
- 10+ Hand-on Labs
Key Highlights of Amazon S3 Tables:
- Purpose-Built Storage for Analytics: Compared to self-managed tables in S3 general-purpose buckets, S3 Tables ensure seamless integration with analytics engines such as Amazon Athena, Amazon EMR, and Apache Spark, delivering up to 3x faster queries and up to 10x more transactions per second.
- Built-In Apache Iceberg Support: S3 Tables leverage the Apache Iceberg format, a widely adopted table format for large-scale analytics.
- Automated Table Optimization: Amazon S3 Tables continuously optimize storage through: Data Compaction, Snapshot Management and Unreferenced File Cleanup.
- Access Management and Security: It use AWS Identity and Access Management (IAM) for fine-grained access control.
- Seamless Integration with AWS Analytics Services: Amazon S3 Tables integrate with AWS services, enabling data analytics without complex configurations: Amazon Athena, AWS Glue, Amazon EMR, Amazon Redshift, Amazon QuickSight and Amazon Lake Formation.
Steps to Use S3 Tables for Analytics
- Create S3 Table Bucket
- Connecting to S3 table buckets with Spark on an Amazon EMR Iceberg cluster
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
aws emr create-cluster \ --release-label emr-7.5.0 \ --applications Name=Spark \ --configurations file://configurations.json \ --region us-east-1 \ --name My_cluster_Iceberg_demo \ --instance-type m5.xlarge \ --instance-count 3 \ --log-uri <a href="https://us-east-1.console.aws.amazon.com/s3/buckets/iceberg-logs-8122024/j-2U0NISO7CE15Q/?region=us-east-1">s3://iceberg-logs-8122024</a>/ \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole |
Configuration.json is as follows:
1 2 3 4 5 6 7 |
[ { "Classification": "iceberg-defaults", "Properties": { "iceberg.enabled": "true"} } ] |
- Once the EMR cluster has been created then connect with the primary node of EMR
- Then next step is to initialize the Spark Session for Iceberg that connects your tables
1 2 3 |
pyspark \ --packages 'software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.0,org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.0' |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("iceberg_lab") \ .config("spark.sql.catalog.s3tablesbucket", "org.apache.iceberg.spark.SparkCatalog") \ .config("spark.sql.catalog.s3tablesbucket.client.region", "us-east-1") \ .config("spark.sql.catalog.defaultCatalog", "s3tablesbucket") \ .config("spark.sql.catalog.s3tablesbucket.warehouse", "arn:aws:s3tables:us-east-1:862820731644:bucket/demo-s3-iceberg-table") \ .config("spark.sql.catalog.s3tablesbucket.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \ .getOrCreate() |
- Now you can work with database using Spark SQL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
spark.sql("CREATE NAMESPACE IF NOT EXISTS demo.CloudThat_db") spark.sql(""" CREATE TABLE IF NOT EXISTS demo.CloudThat_db.orders ( order_id BIGINT, customer_id BIGINT, order_date DATE, total_amount DOUBLE ) USING iceberg """ ) |
Conclusion
Amazon S3 Tables revolutionize data storage for analytics by offering a purpose-built, fully managed solution that seamlessly integrates with popular AWS analytics services. With features like built-in Apache Iceberg support, automated table optimization, and robust security through AWS IAM, S3 Tables deliver enhanced query performance, scalability, and cost efficiency. By simplifying complex configurations and enabling faster data processing, Amazon S3 Tables empower businesses to unlock deeper insights from their analytics workloads with ease and reliability.
Earn Multiple AWS Certifications for the Price of Two
- AWS Authorized Instructor led Sessions
- AWS Official Curriculum
About CloudThat
CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

WRITTEN BY Swati Mathur
Comments