Voiced by Amazon Polly |
Overview
In this fast-moving world, there is a lot of data being generated and stored in multiple data stores which can later be used to get insights into it. But when you want to know the details of a person either to personalize or to promote we need to access only specific information from our entire data, so this is where querying comes into the picture. Here we have a query service called Athena provided by AWS which is a serverless and SQL-based service.
Customized Cloud Solutions to Drive your Business Success
- Cloud Migration
- Devops
- AIML & IoT
Introduction to Amazon Athena
Amazon Athena is a serverless and interactive query service that uses standard SQL to query data directly from Amazon S3 to analyze it. We can directly point Amazon Athena to the data in S3 using the AWS management console to run queries and get results in a very less period also Athena scales parallelly by running the queries on a huge amount of data.
Need for Amazon Athena
Athena works directly with querying without worrying about the data store and data load. This can query data in different formats like structured, semi-structured, and unstructured data which is stored in S3. Athena can integrate with AWS QuickSight to generate reports from the queried data to get insights into the data. Athena also integrates with Glue Data Catalog which is the metadata store for data present in S3. This helps to create tables and query data from the central metadata store available.
If you want to know more about the Top 5 Data Analytics Tools in AWS here.
Steps to query the data present in Amazon S3 using Athena and Glue Data Catalog
Step 1: Create two buckets in S3, one bucket is for your data and the other bucket is to store the results of queries.
Step 2: We need to store the JSON file in the data bucket.
- Note: When you open Athena in the settings which is in the right corner make sure you mention the results in the exact bucket path.
Step 3: After you open the Athena query editor, first we need to connect to data sources, here our data sources are S3 and Glue Data Catalog and click next.
Step 4: Now, choose an option to set up a crawler in AWS Glue to retrieve the schema information and click on connect to AWS Glue.
Step 5: Create a crawler in the AWS Glue by providing the S3 bucket details where the file is present as well as the database where this data should be present also, choose run it on demand, and once the crawler is created click on run crawler.
Step 6: Once the crawler run is successful, the data will be shown in the Athena. Now you can perform any query, and your result will be stored on the S3 result bucket.
Limitations of Amazon Athena
- If your source files start with an underscore or a dot then it will treat them as hidden
- The rows and column size in Athena should not exceed 32 MB
- Athena cannot query data in S3 Glacier and S3 Glacier Deep archive
Conclusion
Amazon Athena is a serverless service that uses SQL to query. This is easy to use and also flexible to run multiple queries at the same time. In Athena, we pay only for the queries we run. Athena uses IAM for security and also can integrate with other AWS services. Hence, Athena can run queries parallelly for large data sets by making complex queries fast.
Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.
- Cloud Training
- Customized Training
- Experiential Learning
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront Service Delivery Partner, Amazon OpenSearch Service Delivery Partner, AWS DMS Service Delivery Partner, AWS Systems Manager Service Delivery Partner, Amazon RDS Service Delivery Partner, AWS CloudFormation Service Delivery Partner, AWS Config, Amazon EMR and many more.
FAQs
1. What are the data formats supported by Athena?
ANS: – Amazon Athena supports data formats like CSV, TSV, JSON, or text files and also supports open-source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats.
2. What kind of data types does Amazon Athena support?
ANS: – Amazon Athena supports both simple data types such as INTEGER, DOUBLE, and VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.
3. What are the AWS data sources Athena can connect to?
ANS: – Athena provides built-in connectors for several data stores including Amazon Redshift, Amazon DynamoDB, Amazon DocumentDB, and BigQuery. You can use these connectors to enable SQL analytics use cases on structured, semi-structured, object, graph, time series, and other data storage types.
WRITTEN BY Lakshmi P Vardhini
Comments