Simplifying ETL Resource Management with AWS Glue Auto Scaling

Overview

Modern data pipelines often deal with unpredictable workloads. Some days you might process gigabytes of data, while on others, you may need to handle terabytes. Traditionally, this meant over-provisioning resources “just in case,” which often led to wasted compute costs. To solve this challenge, AWS Glue Auto Scaling automatically adjusts compute resources based on workload demand, ensuring you pay only for what you need while keeping jobs performant.

In this blog, we will explore what AWS Glue Auto Scaling is, how it works, its benefits, and how you can enable and monitor it effectively.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

AWS Glue Auto Scaling

AWS Glue Auto Scaling allows your ETL, streaming, and interactive jobs to scale out (add more workers) dynamically or scale in (remove idle workers) depending on workload needs. Instead of guessing how many workers a job will require, you define a maximum number of workers, and AWS Glue automatically provisions resources up to that limit.

This feature is available for Glue version 3.0 and later, and supports multiple job types, including:

ETL jobs (batch processing).
Interactive sessions (for development/debugging in notebooks).
Streaming jobs (real-time data processing).

Why Use Auto Scaling?

Here are some key advantages of enabling Auto Scaling in AWS Glue:

Cost Savings

Instead of paying for idle workers, you only pay for actively used workers. Auto Scaling minimizes resource wastage by dynamically adjusting executor counts.

Better Performance

Large data bursts are handled more smoothly because AWS Glue allocates additional executors when needed. When the workload drops, unused workers are released.

Less Manual Effort

You don’t need to predict resource requirements for every job. Just set the maximum number of workers, and AWS Glue will manage scaling automatically.

Efficient Resource Utilization

Amazon CloudWatch metrics like workerUtilization show that jobs typically run with 75–100% efficiency when Auto Scaling is enabled, compared to as low as 20–40% when fixed executors are used.

How do you enable auto scaling in AWS Glue?

You can enable Auto Scaling using the AWS Console, CLI/SDK, or interactive sessions.

Enabling Auto Scaling in Glue Studio (Console)

Open your AWS Glue job in Glue Studio.
Go to the Job details tab.
Select AWS Glue version 3.0 or later.
Check the option to scale the number of workers automatically.
Enter the Maximum number of workers allowed for that job.

step1

2. Enabling Auto Scaling in Interactive Sessions

When using Jupyter notebooks with AWS Glue, you can enable Auto Scaling with:

%%configure
{
  "--enable-auto-scaling": "true",
  "--enable-continuous-cloudwatch-log": "true",
  "--number-of-workers": "20",
  "--worker-type": "G.2X"
}

%%configure

{

"--enable-auto-scaling": "true",

"--enable-continuous-cloudwatch-log": "true",

"--number-of-workers": "20",

"--worker-type": "G.2X"

}

This turns on Auto Scaling and enables Amazon CloudWatch logging for observability.

3. Auto Scaling for Streaming Jobs

For streaming workloads, AWS Glue evaluates multiple micro-batches before scaling. If you want finer control, you can add the following argument:

--auto-scale-within-microbatch true

1	--auto-scale-within-microbatch true

This allows scaling within each micro-batch for faster responsiveness.

Monitoring Auto Scaling

Enabling Auto Scaling is just the start, you also need visibility into how resources are used. AWS Glue offers several ways to monitor scaling behaviour:

Amazon CloudWatch Metrics

executor allocation with metrics like:

driver.ExecutorAllocationManager.executors.numberAllExecutors
driver.ExecutorAllocationManager.executors.numberMaxNeededExecutors
AWS Glue Studio Monitoring Dashboard

Provides visual insights on DPU hours consumed, making it easier to compare jobs with and without Auto Scaling enabled.

Spark UI

Let’s you see when executors are added or removed during a job run.

Amazon CloudWatch Logs

Especially useful in interactive sessions where logs tagged with “executor” show scaling events in real time.

Common Use Cases

Batch Jobs with Variable Data

Perfect for workloads that change daily (for example, different sales volumes on weekdays vs. weekends).

Multi-Stage Jobs

Some stages require more executors than others; Auto Scaling dynamically adjusts executor counts to match each stage’s demand.

Driver-Heavy Workloads

For jobs where certain tasks run mostly on the driver, executors are only scaled up when needed, saving costs.

IP-Constrained VPCs

Auto Scaling limits the number of active workers, reducing pressure on VPC IP allocations.

Conclusion

AWS Glue Auto Scaling helps organizations reduce costs and improve efficiency by dynamically matching compute resources with workload needs.

Instead of manually tuning executors, you can let AWS Glue handle it while you focus on data transformation logic. With monitoring via Amazon CloudWatch and AWS Glue Studio, it’s easy to ensure that Auto Scaling is working as expected.

By enabling this feature, businesses gain a smarter, more cost-effective way to manage ETL and streaming pipelines.

Drop a query if you have any questions regarding AWS Glue Auto Scaling and we will get back to you quickly.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

Reduced infrastructure costs
Timely data-driven decisions

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. Do I pay for the maximum workers or only the workers used?

ANS: – You pay only for the maximum number of DPUs consumed during a job run, not the set worker limits.

2. Can Auto Scaling be enabled programmatically?

ANS: – Yes. Auto Scaling can be enabled via the AWS CLI, SDKs, or interactive sessions using the –enable-auto-scaling argument.

3. How can I monitor Auto Scaling?

ANS: – You can monitor via Amazon CloudWatch metrics, AWS Glue Studio’s monitoring dashboard, Spark UI, and executor logs. These tools show when and how executors are added or removed.

WRITTEN BY Anusha

Anusha works as a Subject Matter Expert at CloudThat. She handles AWS-based data engineering tasks such as building data pipelines, automating workflows, and creating dashboards. She focuses on developing efficient and reliable cloud solutions.