Unlock the full potential of your analytics data with Amazon S3 Tables

Voiced by Amazon Polly

Amazon Web Services (AWS) has announced Amazon S3 Tables, a new storage solution designed specifically for analytics workloads. Built to handle tabular data efficiently, such as daily transactions, sensor data, and ad impressions, S3 Tables offer superior query performance, higher transactions per second, and reduced storage costs—all while providing the operational ease of a fully managed service.

Transform Your Career with AWS Certifications

Advanced Skills
AWS Official Curriculum
10+ Hand-on Labs

Enroll Now

Key Highlights of Amazon S3 Tables:

Purpose-Built Storage for Analytics: Compared to self-managed tables in S3 general-purpose buckets, S3 Tables ensure seamless integration with analytics engines such as Amazon Athena, Amazon EMR, and Apache Spark, delivering up to 3x faster queries and up to 10x more transactions per second.
Built-In Apache Iceberg Support: S3 Tables leverage the Apache Iceberg format, a widely adopted table format for large-scale analytics.
Automated Table Optimization: Amazon S3 Tables continuously optimize storage through: Data Compaction, Snapshot Management and Unreferenced File Cleanup.
Access Management and Security: It use AWS Identity and Access Management (IAM) for fine-grained access control.
Seamless Integration with AWS Analytics Services: Amazon S3 Tables integrate with AWS services, enabling data analytics without complex configurations: Amazon Athena, AWS Glue, Amazon EMR, Amazon Redshift, Amazon QuickSight and Amazon Lake Formation.

Steps to Use S3 Tables for Analytics

Create S3 Table Bucket

Connecting to S3 table buckets with Spark on an Amazon EMR Iceberg cluster

aws emr create-cluster \

  --release-label emr-7.5.0 \

  --applications Name=Spark \

  --configurations file://configurations.json \

  --region us-east-1 \

  --name My_cluster_Iceberg_demo \

  --instance-type m5.xlarge \

  --instance-count 3 \

  --log-uri <a href="https://us-east-1.console.aws.amazon.com/s3/buckets/iceberg-logs-8122024/j-2U0NISO7CE15Q/?region=us-east-1">s3://iceberg-logs-8122024</a>/ \

  --service-role EMR_DefaultRole \

  --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole

aws emr create-cluster \

--release-label emr-7.5.0 \

--applications Name=Spark \

--configurations file://configurations.json \

--region us-east-1 \

--name My_cluster_Iceberg_demo \

--instance-type m5.xlarge \

--instance-count 3 \

--log-uri <a href="https://us-east-1.console.aws.amazon.com/s3/buckets/iceberg-logs-8122024/j-2U0NISO7CE15Q/?region=us-east-1">s3://iceberg-logs-8122024</a>/ \

--service-role EMR_DefaultRole \

--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole

Configuration.json is as follows:

[ {

    "Classification": "iceberg-defaults",

    "Properties": { "iceberg.enabled": "true"}

  } ]

[ {

"Classification": "iceberg-defaults",

"Properties": { "iceberg.enabled": "true"}

} ]

Once the EMR cluster has been created then connect with the primary node of EMR
Then next step is to initialize the Spark Session for Iceberg that connects your tables

pyspark \

 --packages 'software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.0,org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.0'

pyspark \

--packages 'software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.0,org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.0'

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("iceberg_lab") \

    .config("spark.sql.catalog.s3tablesbucket", "org.apache.iceberg.spark.SparkCatalog") \

    .config("spark.sql.catalog.s3tablesbucket.client.region", "us-east-1") \

    .config("spark.sql.catalog.defaultCatalog", "s3tablesbucket") \

    .config("spark.sql.catalog.s3tablesbucket.warehouse", "arn:aws:s3tables:us-east-1:862820731644:bucket/demo-s3-iceberg-table") \

    .config("spark.sql.catalog.s3tablesbucket.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \

    .getOrCreate()

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("iceberg_lab") \

.config("spark.sql.catalog.s3tablesbucket", "org.apache.iceberg.spark.SparkCatalog") \

.config("spark.sql.catalog.s3tablesbucket.client.region", "us-east-1") \

.config("spark.sql.catalog.defaultCatalog", "s3tablesbucket") \

.config("spark.sql.catalog.s3tablesbucket.warehouse", "arn:aws:s3tables:us-east-1:862820731644:bucket/demo-s3-iceberg-table") \

.config("spark.sql.catalog.s3tablesbucket.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \

.getOrCreate()

Now you can work with database using Spark SQL

spark.sql("CREATE NAMESPACE IF NOT EXISTS demo.CloudThat_db")

spark.sql("""

CREATE TABLE IF NOT EXISTS demo.CloudThat_db.orders (

  order_id BIGINT,

  customer_id BIGINT,

  order_date DATE,

  total_amount DOUBLE

) USING iceberg

"""

)

spark.sql("CREATE NAMESPACE IF NOT EXISTS demo.CloudThat_db")

spark.sql("""

CREATE TABLE IF NOT EXISTS demo.CloudThat_db.orders (

order_id BIGINT,

customer_id BIGINT,

order_date DATE,

total_amount DOUBLE

) USING iceberg

"""

)

Conclusion

Amazon S3 Tables revolutionize data storage for analytics by offering a purpose-built, fully managed solution that seamlessly integrates with popular AWS analytics services. With features like built-in Apache Iceberg support, automated table optimization, and robust security through AWS IAM, S3 Tables deliver enhanced query performance, scalability, and cost efficiency. By simplifying complex configurations and enabling faster data processing, Amazon S3 Tables empower businesses to unlock deeper insights from their analytics workloads with ease and reliability.

Earn Multiple AWS Certifications for the Price of Two

AWS Authorized Instructor led Sessions
AWS Official Curriculum

Get Started Now

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As an AWS Premier Tier Services Partner, AWS Advanced Training Partner, Microsoft Solutions Partner, and Google Cloud Platform Partner, CloudThat has empowered over 1.1 million professionals through 1000+ cloud certifications, winning global recognition for its training excellence, including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 14 awards in the last 9 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, Security, IoT, and advanced technologies like Gen AI & AI/ML. It has delivered over 750 consulting projects for 850+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

Amazon S3

WRITTEN BY Swati Mathur

Swati Mathur is a Subject Matter Expert at CloudThat, specializing in Cloud Computing and ML\GenAI. With more than 15 years of experience in IT Training and consulting, she has trained over 1000+ professionals and students to upskill in multiple technologies. Known for simplifying complex concepts and delivering interactive, hands-on sessions, she brings deep technical knowledge and practical application into every learning experience. Swati's passion for public speaking and continuous learning reflects in her unique approach to learning and development.