Scaling Applications with Kubernetes: A Comprehensive Guide to Horizontal Pod Autoscaling

Voiced by Amazon Polly

In today’s cloud-native environment, applications must be resilient and adaptable to varying loads. As user demand fluctuates, maintaining optimal performance while managing resource costs becomes a critical challenge. This is where Kubernetes shines, offering powerful tools for scaling applications based on real-time demand. One such feature is the Horizontal Pod Autoscaler (HPA), which allows you to dynamically adjust the number of pod replicas in your deployment based on CPU utilization or other select metrics. In this blog post, we will explore how HPA works and guide you through a hands-on lab to implement it with a simple web application.

Start Learning In-Demand Tech Skills with Expert-Led Training

Industry-Authorized Curriculum
Expert-led Training

Enroll Now

What is Horizontal Pod Autoscaler (HPA)?

The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replica set, or stateful set based on observed metrics such as CPU utilization or memory consumption. When the resource usage exceeds a defined threshold, HPA increases the number of replicas to maintain performance. Conversely, it reduces the number of replicas when the demand decreases, optimizing resource usage and costs.

Key Features of HPA

Dynamic Scaling: HPA adjusts pod replicas automatically based on real-time metrics.
Custom Metrics: Besides CPU and memory, HPA can scale based on custom metrics using the Kubernetes Metrics Server or external metrics providers.
Integration: HPA works seamlessly with other Kubernetes resources, such as Deployments and ReplicaSets.

Hands-On Lab: Implementing HPA with a Simple Web Application

In this section, we will set up a simple web application and configure the Horizontal Pod Autoscaler to automatically scale the number of replicas based on CPU utilization.

Prerequisites

A running Kubernetes cluster (kubeadm, minikube, GKE, EKS, etc.).
kubectl installed and configured to communicate with your cluster.
Basic knowledge of Kubernetes concepts.

Step 1: Deploy a Simple Web Application

First, we will create a simple web application that we can scale. For this lab, we will use a sample NGINX application.

Create a Deployment: Create a file named nginx-deployment.yaml with the following content:

apiVersion: apps/v1

kind: Deployment

metadata:

  name: nginx-deployment

spec:

  replicas: 2

  selector:

    matchLabels:

      app: nginx

  template:

    metadata:

      labels:

        app: nginx

    spec:

      containers:

      - name: nginx

        image: nginx

        resources:

          requests:

            cpu: "200m" 

            memory: "256Mi" 

          limits:

            cpu: "500m" 

            memory: "512Mi" 

        ports:

        - containerPort: 80

apiVersion: apps/v1

kind: Deployment

metadata:

spec:

replicas: 2

selector:

matchLabels:

app: nginx

template:

metadata:

labels:

app: nginx

spec:

containers:

- name: nginx

image: nginx

resources:

requests:

cpu: "200m"

memory: "256Mi"

limits:

cpu: "500m"

memory: "512Mi"

ports:

- containerPort: 80

Apply the Deployment: Run the following command to create the deployment:

kubectl apply -f nginx-deployment.yaml

1	kubectl apply -f nginx-deployment.yaml

Verify the Deployment: Check if the deployment is up and running:

kubectl get deployments

1	kubectl get deployments

Step 2: Expose the Application

Next, we will expose our application using a Kubernetes Service so we can access it.

Create a Service: Create a file named nginx-service.yaml with the following content:

apiVersion: v1

kind: Service

metadata:

  name: nginx-service

spec:

  selector:

    app: nginx

  ports:

    - protocol: TCP

      port: 80

      targetPort: 80

  type: NodePort

apiVersion: v1

kind: Service

metadata:

spec:

selector:

app: nginx

ports:

- protocol: TCP

port: 80

targetPort: 80

type: NodePort

Apply the Service: Run the following command:

kubectl apply -f nginx-service.yaml

1	kubectl apply -f nginx-service.yaml

Get the Service Information: After a few moments, you can retrieve the service details:

kubectl get services

1	kubectl get services

Step 3: Install the Metrics Server

HPA relies on the Metrics Server to gather metrics. If you have not installed it yet, you can do so with the following commands:

Install Metrics Server: Run the following command to install the Metrics Server:

kubectl apply -f <a href="https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml">https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml</a>

1	kubectl apply -f <a href="https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml">https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml</a>

Verify the Metrics Server: Check that the Metrics Server is running:

kubectl get pods -n kube-system

1	kubectl get pods -n kube-system

Step 4: Create the Horizontal Pod Autoscaler

Now that we have our application running and the Metrics Server in place, we can create the HPA.

Create an HPA Configuration: Create a file named hpa.yaml with the following content:

apiVersion: autoscaling/v1

kind: HorizontalPodAutoscaler

metadata:

  name: nginx-hpa

spec:

  scaleTargetRef:

    apiVersion: apps/v1

    kind: Deployment

    name: nginx-deployment

  minReplicas: 2

  maxReplicas: 10

  targetCPUUtilizationPercentage: 70

apiVersion: autoscaling/v1

kind: HorizontalPodAutoscaler

metadata:

spec:

scaleTargetRef:

apiVersion: apps/v1

kind: Deployment

minReplicas: 2

maxReplicas: 10

targetCPUUtilizationPercentage: 70

Apply the HPA Configuration: Run the following command:

kubectl apply -f hpa.yaml

1	kubectl apply -f hpa.yaml

Verify HPA Creation: Check if the HPA is created successfully:

kubectl get hpa

1	kubectl get hpa

Step 5: Simulate Load to Test Autoscaling

To see HPA in action, we need to simulate some load on our application.

Install Apache Benchmark (or any other load testing tool):

For example, if you are using a local machine with apt-get:

sudo apt-get install apache2-utils

1	sudo apt-get install apache2-utils

Run Load Test: Replace <EXTERNAL_IP> with the actual IP address of your NGINX service obtained earlier:

ab -n [total_requests] -c [concurrent_requests] <a href="http://[hostname]:[port]/%5bpath">http://[hostname]:[port]/[path</a>]

1	ab -n [total_requests] -c [concurrent_requests] <a href="http://[hostname]:[port]/%5bpath">http://[hostname]:[port]/[path</a>]

E.g. ab -n 100000 -c 300 http://3.129.8.95:30080/

Step 6: Monitor the Autoscaling

Check the HPA Status: After running the load test, monitor the HPA status:

kubectl get hpa -w

1	kubectl get hpa -w

You should see the number of replicas scaling up based on CPU usage.

Conclusion

In this blog post, we have explored how to leverage the Horizontal Pod Autoscaler in Kubernetes to dynamically scale applications based on demand. By deploying a simple NGINX application and configuring HPA, you can automatically adjust the number of replicas to maintain optimal performance under varying loads.

With HPA, Kubernetes offers a robust solution for managing application scalability, ensuring your applications remain responsive while optimizing resource usage and cost. As you delve deeper into Kubernetes, consider exploring more advanced features like custom metrics or external metrics providers for further scalability enhancements. Happy scaling!

Upskill Your Teams with Enterprise-Ready Tech Training Programs

Team-wide Customizable Programs
Measurable Business Outcomes

Learn More

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.