Voiced by Amazon Polly |
Overview
AWS Glue is a serverless AWS data integration service that facilitates the discovery, preparation, movement, and integration of data from many sources for analytics users. It can be used for application development, machine learning, and analytics. Additional productivity and data operations functionality is also included for authoring, running jobs, and implementing business workflows.
You can find and connect to more than 70 data sources using AWS Glue and manage your data in a centralized data catalog. You can graphically build, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. You may also rapidly search and query cataloged data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
AWS Glue combines significant data integration capabilities into a single service. These include centralized data finding, cleaning, contemporary ETL, and cataloging. Additionally, because it is serverless, there is no infrastructure to maintain. AWS Glue enables users across multiple workloads and types of users with flexible support for all workloads like ETL, ELT, and streaming in a single service.
Now we will convert some CSV files to parquet format without writing any program code.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Step-by-Step Guide
Step 1: CSV files in Amazon S3 buckets must be converted to Parquet format.
Go to Amazon S3 Bucket from the AWS Console,
Step 2: Go to the AWS Glue Console and select ETL Jobs from the left menu pane,
Select the “Visual with a source and target” option, then click on “Create”
Select the Data source, edit the “Data source properties – S3,” and provide your S3 bucket and the path in which CSV files residing,
Select the Data format as CSV,
Now select the “Data target – S3 bucket”, then select the Format as parquet, compression type as Uncompressed,
Now select the Output file location,
Then please save the Job which we created,
Now we need to click on the “Run” Job,
Step 3:
You can view that your Job is Running,
Once the Job is complete, you can view the parquet formatted file in your target path in the Amazon S3 bucket,
Conclusion
Now we have easily converted CSV files to parquet format without using any programming code. Nodes for your job are configured using the visual job editor. Every node represents a certain action, such as reading data from the source location or transforming the data. Each node you include in your job includes characteristics that reveal details about the transform or the location of the data.
Making IT Networks Enterprise-ready – Cloud Management Services
- Accelerated cloud migration
- End-to-end view of the cloud environment
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront Service Delivery Partner, Amazon OpenSearch Service Delivery Partner, AWS DMS Service Delivery Partner, AWS Systems Manager Service Delivery Partner, Amazon RDS Service Delivery Partner, AWS CloudFormation Service Delivery Partner, AWS Config, Amazon EMR and many more.
FAQs
1. When do billing for my AWS Glue Jobs start and stop?
ANS: – Billing begins as soon as the task is planned to be completed and continues until the full job is completed. You pay for your job’s time with AWS Glue, not for environment provisioning or termination time.
2. What programming language can I use to develop AWS Glue ETL code?
ANS: – You have the option of using Scala or Python.
3. Can I include custom libraries in my ETL script?
ANS: – Yes. Custom Python libraries and Jar files can be imported into your AWS Glue ETL process.
WRITTEN BY Deepak Surendran
Comments