|
Voiced by Amazon Polly |
Amazon Web Services (AWS) has announced Amazon S3 Tables, a new storage solution designed specifically for analytics workloads. Built to handle tabular data efficiently, such as daily transactions, sensor data, and ad impressions, S3 Tables offer superior query performance, higher transactions per second, and reduced storage costs—all while providing the operational ease of a fully managed service.
Transform Your Career with AWS Certifications
- Advanced Skills
- AWS Official Curriculum
- 10+ Hand-on Labs
Key Highlights of Amazon S3 Tables:
- Purpose-Built Storage for Analytics: Compared to self-managed tables in S3 general-purpose buckets, S3 Tables ensure seamless integration with analytics engines such as Amazon Athena, Amazon EMR, and Apache Spark, delivering up to 3x faster queries and up to 10x more transactions per second.
- Built-In Apache Iceberg Support: S3 Tables leverage the Apache Iceberg format, a widely adopted table format for large-scale analytics.
- Automated Table Optimization: Amazon S3 Tables continuously optimize storage through: Data Compaction, Snapshot Management and Unreferenced File Cleanup.
- Access Management and Security: It use AWS Identity and Access Management (IAM) for fine-grained access control.
- Seamless Integration with AWS Analytics Services: Amazon S3 Tables integrate with AWS services, enabling data analytics without complex configurations: Amazon Athena, AWS Glue, Amazon EMR, Amazon Redshift, Amazon QuickSight and Amazon Lake Formation.
Steps to Use S3 Tables for Analytics
- Create S3 Table Bucket
- Connecting to S3 table buckets with Spark on an Amazon EMR Iceberg cluster
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
aws emr create-cluster \ --release-label emr-7.5.0 \ --applications Name=Spark \ --configurations file://configurations.json \ --region us-east-1 \ --name My_cluster_Iceberg_demo \ --instance-type m5.xlarge \ --instance-count 3 \ --log-uri <a href="https://us-east-1.console.aws.amazon.com/s3/buckets/iceberg-logs-8122024/j-2U0NISO7CE15Q/?region=us-east-1">s3://iceberg-logs-8122024</a>/ \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole |
Configuration.json is as follows:
|
1 2 3 4 5 6 7 |
[ { "Classification": "iceberg-defaults", "Properties": { "iceberg.enabled": "true"} } ] |
- Once the EMR cluster has been created then connect with the primary node of EMR
- Then next step is to initialize the Spark Session for Iceberg that connects your tables
|
1 2 3 |
pyspark \ --packages 'software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.0,org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.0' |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("iceberg_lab") \ .config("spark.sql.catalog.s3tablesbucket", "org.apache.iceberg.spark.SparkCatalog") \ .config("spark.sql.catalog.s3tablesbucket.client.region", "us-east-1") \ .config("spark.sql.catalog.defaultCatalog", "s3tablesbucket") \ .config("spark.sql.catalog.s3tablesbucket.warehouse", "arn:aws:s3tables:us-east-1:862820731644:bucket/demo-s3-iceberg-table") \ .config("spark.sql.catalog.s3tablesbucket.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \ .getOrCreate() |
- Now you can work with database using Spark SQL
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
spark.sql("CREATE NAMESPACE IF NOT EXISTS demo.CloudThat_db") spark.sql(""" CREATE TABLE IF NOT EXISTS demo.CloudThat_db.orders ( order_id BIGINT, customer_id BIGINT, order_date DATE, total_amount DOUBLE ) USING iceberg """ ) |
Conclusion
Amazon S3 Tables revolutionize data storage for analytics by offering a purpose-built, fully managed solution that seamlessly integrates with popular AWS analytics services. With features like built-in Apache Iceberg support, automated table optimization, and robust security through AWS IAM, S3 Tables deliver enhanced query performance, scalability, and cost efficiency. By simplifying complex configurations and enabling faster data processing, Amazon S3 Tables empower businesses to unlock deeper insights from their analytics workloads with ease and reliability.
Earn Multiple AWS Certifications for the Price of Two
- AWS Authorized Instructor led Sessions
- AWS Official Curriculum
About CloudThat
WRITTEN BY Swati Mathur
Swati Mathur is a Subject Matter Expert at CloudThat, specializing in Cloud Computing and ML\GenAI. With more than 15 years of experience in IT Training and consulting, she has trained over 1000+ professionals and students to upskill in multiple technologies. Known for simplifying complex concepts and delivering interactive, hands-on sessions, she brings deep technical knowledge and practical application into every learning experience. Swati's passion for public speaking and continuous learning reflects in her unique approach to learning and development.
Login

December 23, 2024
PREV
Comments