Cloud Computing, DevOps, Kubernetes

5 Mins Read

Autoscaling in Kubernetes with Vertical Pod Autoscaler (VPA)

Voiced by Amazon Polly


For a Kubernetes-based e-commerce application that experiences variable workloads throughout the day. During peak hours, such as flash sales or high-traffic periods, the application encounters increased demand, leading to performance challenges. To address this, you implement Vertical Pod Autoscaler (VPA) to optimize resource allocations for improved application responsiveness dynamically. Vertical Pod Autoscaler is invaluable in addressing the challenges of variable workloads in an e-commerce application. By dynamically optimizing resource allocations, VPA ensures optimal performance, enhances user experience, and contributes to cost-efficient resource utilization.

The blog goes through the significance of Vertical Pod Autoscaler (VPA) in Kubernetes, emphasizing its role in dynamically optimizing resource allocations for individual pods. Key features, benefits, and practical use cases, such as enhancing performance in an e-commerce application during variable workloads, are explored.


Kubernetes stands as a leader in container orchestration, offering robust features to manage and scale containerized applications.

The Vertical Pod Autoscaler (VPA) takes center stage among its autoscaling capabilities. This blog aims to provide an in-depth understanding of Vertical Pod Autoscaler—its significance, applications, and a comprehensive guide on its implementation within Kubernetes clusters.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started


Vertical Pod Autoscaler (VPA)

The Vertical Pod Autoscaler is a sophisticated component within the Kubernetes ecosystem, designed to dynamically adjust resource requests for individual pod containers based on their real-time usage metrics. Unlike its counterpart, the Horizontal Pod Autoscaler (HPA), which scales the number of pod replicas, VPA focuses on fine-tuning CPU and memory resource allocations for individual pods, optimizing resource utilization.

Key Features and Benefits

  • Efficient Resource Utilization
  • Improved Application Performance
  • Cost Optimization
  • Adaptability to Workload Changes

Steps to Implement Vertical Pod Autoscaler

Step 1: Install VPA in Your Cluster

Download the source code for the Vertical Pod Autoscaler from GitHub. For instance, use the following command:

Navigate to the directory for the Vertical Pod Autoscaler:

If you have previously deployed the Vertical Pod Autoscaler, remove it with the command:

Deploy the Vertical Pod Autoscaler using the command:

Confirm the successful creation of Vertical Pod Autoscaler pods by running the command:


Step 2: Launch the Sample Application

To deploy a deployment and a corresponding Vertical Pod Autoscaler, initiate the deployment of the sample hamster application using the following command:


This deployment of the hamster application generates a deployment featuring two pods and a Vertical Pod Autoscaler linked to the deployment.

Confirm the successful creation of the hamster pods by executing the command:


Inspect the CPU and memory reservations using the kubectl describe pod command and one of the hamster pod names obtained in the preceding step. For instance:

In the configuration file you got by the above command and in the “requests” section of the output, observe the existing CPU and memory allocations for the pod as shown below:


The Recommender component of the Vertical Pod Autoscaler assesses pod behavior to evaluate the adequacy of CPU and memory reservations. It’s important to note that the observed CPU and memory reservations may differ.

The current reservations are insufficient due to a deliberate under-resourcing of the sample hamster application. Each pod runs a single container with the following specifications:

  • CPU request is set at 100 milliCores, but the container attempts to use more than 500 milliCores.
  • Memory reservation is notably lower than the required amount for proper execution.

Step 3: Monitor the Scaling Operation

After evaluating the inadequacy of CPU and memory reservations in the original pods of the sample hamster application, the Vertical Pod Autoscaler, specifically the Updater, initiates the relaunch of pods with revised values recommended by the Recommender. It’s important to note that the Vertical Pod Autoscaler does not modify the deployment template but updates the actual requests of the pods.

To track the pods in the sample hamster application and await the initiation of a new hamster pod with a different name by the Updater, use the following command:

Once a new hamster pod has started, inspect its CPU and memory reservations using the kubectl describe pod command along with the pod’s name, as shown in this example:

In the “Requests” section of the output, observe the CPU and memory reservations for the new pod:


In the given example, notice the increased CPU reservation to 587 milliCores and the memory reservation to 262,144 Kilobytes. The original pod had insufficient resources, and the Vertical Pod Autoscaler has adjusted the reservations to more suitable values. It’s important to acknowledge that CPU and memory reservations may vary.

Step 4: Inspection

Inspect the recommendations provided by the Vertical Pod Autoscaler, specifically those made by the Recommender, using the following command:


Step 5: Cleanup

To uninstall the sample application, execute the following command:

In the directory for the Vertical Pod Autoscaler (vertical-pod-autoscaler), eliminate the Vertical Pod Autoscaler deployment with the command:


Vertical Pod Autoscaler emerges as a pivotal tool within the Kubernetes ecosystem, offering a sophisticated approach to resource optimization. By dynamically adjusting the resource requests of individual pods based on their actual usage, VPA addresses the challenges of efficient resource utilization, cost optimization, and adaptability to changing workloads. As organizations delve into the realm of autoscaling in Kubernetes, the integration of Vertical Pod Autoscaler stands out as a strategic choice to elevate the efficiency of containerized workloads and ensure the optimal performance of applications.

Drop a query if you have any questions regarding Vertical Pod Autoscaler and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, Microsoft Gold Partner, and many more, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

To get started, go through our Consultancy page and Managed Services PackageCloudThat’s offerings.


1. How can I scale my application pods dynamically based on resource utilization in Kubernetes?

ANS: – Employ the Horizontal Pod Autoscaler (HPA) in Kubernetes to adjust pod counts automatically in a deployment or replica set. HPA observes CPU or memory usage, scaling pods horizontally based on defined utilization thresholds. Configure resource metrics in the HPA to enable Kubernetes to scale your application dynamically.

2. What is the difference between resource requests and resource limits in Kubernetes?

ANS: – Kubernetes employs resource requests to guarantee CPU and memory allocations for a container, while resource limits set maximum usage thresholds. These values impact scheduling decisions and prevent containers from consuming excessive resources, ensuring predictable performance and resource utilization in the cluster.

3. How does the Vertical Pod Autoscaler (VPA) differ from the Horizontal Pod Autoscaler (HPA) in Kubernetes?

ANS: – While the Horizontal Pod Autoscaler scales the number of pod replicas based on observed metrics, the Vertical Pod Autoscaler focuses on adjusting the resource requests and limits of individual pods. VPA analyzes pod resource usage and recommends adjustments to optimize resource allocation. Unlike HPA, which scales horizontally by adding or removing pod replicas, VPA scales vertically by fine-tuning the resource allocations of existing pods.

WRITTEN BY Bhanu Prakash K

K Bhanu Prakash is working as a Subject Matter Expert in CloudThat. He is proficient in Managing and configuring AWS Infrastructure as well as on Kubernetes and DevOps tools like Terraform, ansible, Jenkins, and Git. He is very keen on learning new technologies and publishing blogs for the tech community.



    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!