Kubernetes

5 Mins Read

Scaling Applications with Kubernetes: A Comprehensive Guide to Horizontal Pod Autoscaling

Voiced by Amazon Polly

In today’s cloud-native environment, applications must be resilient and adaptable to varying loads. As user demand fluctuates, maintaining optimal performance while managing resource costs becomes a critical challenge. This is where Kubernetes shines, offering powerful tools for scaling applications based on real-time demand. One such feature is the Horizontal Pod Autoscaler (HPA), which allows you to dynamically adjust the number of pod replicas in your deployment based on CPU utilization or other select metrics. In this blog post, we will explore how HPA works and guide you through a hands-on lab to implement it with a simple web application.

Freedom Month Sale — Upgrade Your Skills, Save Big!

  • Up to 80% OFF AWS Courses
  • Up to 30% OFF Microsoft Certs
Act Fast!

What is Horizontal Pod Autoscaler (HPA)?

The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replica set, or stateful set based on observed metrics such as CPU utilization or memory consumption. When the resource usage exceeds a defined threshold, HPA increases the number of replicas to maintain performance. Conversely, it reduces the number of replicas when the demand decreases, optimizing resource usage and costs.

Key Features of HPA

  • Dynamic Scaling: HPA adjusts pod replicas automatically based on real-time metrics.
  • Custom Metrics: Besides CPU and memory, HPA can scale based on custom metrics using the Kubernetes Metrics Server or external metrics providers.
  • Integration: HPA works seamlessly with other Kubernetes resources, such as Deployments and ReplicaSets.

Hands-On Lab: Implementing HPA with a Simple Web Application

In this section, we will set up a simple web application and configure the Horizontal Pod Autoscaler to automatically scale the number of replicas based on CPU utilization.

Prerequisites

  • A running Kubernetes cluster (kubeadm, minikube, GKE, EKS, etc.).
  • kubectl installed and configured to communicate with your cluster.
  • Basic knowledge of Kubernetes concepts.

Step 1: Deploy a Simple Web Application

First, we will create a simple web application that we can scale. For this lab, we will use a sample NGINX application.

  • Create a Deployment: Create a file named nginx-deployment.yaml with the following content:
  • Apply the Deployment: Run the following command to create the deployment:
  • Verify the Deployment: Check if the deployment is up and running:

Step 2: Expose the Application

Next, we will expose our application using a Kubernetes Service so we can access it.

  • Create a Service: Create a file named nginx-service.yaml with the following content:
  • Apply the Service: Run the following command:

  • Get the Service Information: After a few moments, you can retrieve the service details:

 

Step 3: Install the Metrics Server

HPA relies on the Metrics Server to gather metrics. If you have not installed it yet, you can do so with the following commands:

  • Install Metrics Server: Run the following command to install the Metrics Server:

  • Verify the Metrics Server: Check that the Metrics Server is running:

 

Step 4: Create the Horizontal Pod Autoscaler

Now that we have our application running and the Metrics Server in place, we can create the HPA.

  • Create an HPA Configuration: Create a file named hpa.yaml with the following content:

 

  • Apply the HPA Configuration: Run the following command:
  • Verify HPA Creation: Check if the HPA is created successfully:

Step 5: Simulate Load to Test Autoscaling

To see HPA in action, we need to simulate some load on our application.

  • Install Apache Benchmark (or any other load testing tool):

For example, if you are using a local machine with apt-get:

  • Run Load Test: Replace <EXTERNAL_IP> with the actual IP address of your NGINX service obtained earlier:

E.g. ab -n 100000 -c 300 http://3.129.8.95:30080/

 

Step 6: Monitor the Autoscaling

  • Check the HPA Status: After running the load test, monitor the HPA status:

You should see the number of replicas scaling up based on CPU usage.

Conclusion

In this blog post, we have explored how to leverage the Horizontal Pod Autoscaler in Kubernetes to dynamically scale applications based on demand. By deploying a simple NGINX application and configuring HPA, you can automatically adjust the number of replicas to maintain optimal performance under varying loads.

With HPA, Kubernetes offers a robust solution for managing application scalability, ensuring your applications remain responsive while optimizing resource usage and cost. As you delve deeper into Kubernetes, consider exploring more advanced features like custom metrics or external metrics providers for further scalability enhancements. Happy scaling!

Freedom Month Sale — Discounts That Set You Free!

  • Up to 80% OFF AWS Courses
  • Up to 30% OFF Microsoft Certs
Act Fast!

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

WRITTEN BY Komal Singh

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!