Derive the Data Swiftly by Querying using Amazon Athena

Overview

In this fast-moving world, there is a lot of data being generated and stored in multiple data stores which can later be used to get insights into it. But when you want to know the details of a person either to personalize or to promote we need to access only specific information from our entire data, so this is where querying comes into the picture. Here we have a query service called Athena provided by AWS which is a serverless and SQL-based service.

Customized Cloud Solutions to Drive your Business Success

Cloud Migration
Devops
AIML & IoT

Know More

Introduction to Amazon Athena

Amazon Athena is a serverless and interactive query service that uses standard SQL to query data directly from Amazon S3 to analyze it. We can directly point Amazon Athena to the data in S3 using the AWS management console to run queries and get results in a very less period also Athena scales parallelly by running the queries on a huge amount of data.

Need for Amazon Athena

Athena works directly with querying without worrying about the data store and data load. This can query data in different formats like structured, semi-structured, and unstructured data which is stored in S3. Athena can integrate with AWS QuickSight to generate reports from the queried data to get insights into the data. Athena also integrates with Glue Data Catalog which is the metadata store for data present in S3. This helps to create tables and query data from the central metadata store available.

If you want to know more about the Top 5 Data Analytics Tools in AWS here.

Steps to query the data present in Amazon S3 using Athena and Glue Data Catalog

Step 1: Create two buckets in S3, one bucket is for your data and the other bucket is to store the results of queries.

Step 2: We need to store the JSON file in the data bucket.

Note: When you open Athena in the settings which is in the right corner make sure you mention the results in the exact bucket path.

Step 3: After you open the Athena query editor, first we need to connect to data sources, here our data sources are S3 and Glue Data Catalog and click next.

Step 4: Now, choose an option to set up a crawler in AWS Glue to retrieve the schema information and click on connect to AWS Glue.

Step 5: Create a crawler in the AWS Glue by providing the S3 bucket details where the file is present as well as the database where this data should be present also, choose run it on demand, and once the crawler is created click on run crawler.

Step 6: Once the crawler run is successful, the data will be shown in the Athena. Now you can perform any query, and your result will be stored on the S3 result bucket.

Limitations of Amazon Athena

If your source files start with an underscore or a dot then it will treat them as hidden
The rows and column size in Athena should not exceed 32 MB
Athena cannot query data in S3 Glacier and S3 Glacier Deep archive

Conclusion

Amazon Athena is a serverless service that uses SQL to query. This is easy to use and also flexible to run multiple queries at the same time. In Athena, we pay only for the queries we run. Athena uses IAM for security and also can integrate with other AWS services. Hence, Athena can run queries parallelly for large data sets by making complex queries fast.

Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.

Cloud Training
Customized Training
Experiential Learning

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

FAQs

1. What are the data formats supported by Athena?

ANS: – Amazon Athena supports data formats like CSV, TSV, JSON, or text files and also supports open-source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats.

2. What kind of data types does Amazon Athena support?

ANS: – Amazon Athena supports both simple data types such as INTEGER, DOUBLE, and VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.

3. What are the AWS data sources Athena can connect to?

ANS: – Athena provides built-in connectors for several data stores including Amazon Redshift, Amazon DynamoDB, Amazon DocumentDB, and BigQuery. You can use these connectors to enable SQL analytics use cases on structured, semi-structured, object, graph, time series, and other data storage types.