Voiced by Amazon Polly |
In today’s cloud-native environment, applications must be resilient and adaptable to varying loads. As user demand fluctuates, maintaining optimal performance while managing resource costs becomes a critical challenge. This is where Kubernetes shines, offering powerful tools for scaling applications based on real-time demand. One such feature is the Horizontal Pod Autoscaler (HPA), which allows you to dynamically adjust the number of pod replicas in your deployment based on CPU utilization or other select metrics. In this blog post, we will explore how HPA works and guide you through a hands-on lab to implement it with a simple web application.
Freedom Month Sale — Upgrade Your Skills, Save Big!
- Up to 80% OFF AWS Courses
- Up to 30% OFF Microsoft Certs
What is Horizontal Pod Autoscaler (HPA)?
The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replica set, or stateful set based on observed metrics such as CPU utilization or memory consumption. When the resource usage exceeds a defined threshold, HPA increases the number of replicas to maintain performance. Conversely, it reduces the number of replicas when the demand decreases, optimizing resource usage and costs.
Key Features of HPA
- Dynamic Scaling: HPA adjusts pod replicas automatically based on real-time metrics.
- Custom Metrics: Besides CPU and memory, HPA can scale based on custom metrics using the Kubernetes Metrics Server or external metrics providers.
- Integration: HPA works seamlessly with other Kubernetes resources, such as Deployments and ReplicaSets.
Hands-On Lab: Implementing HPA with a Simple Web Application
In this section, we will set up a simple web application and configure the Horizontal Pod Autoscaler to automatically scale the number of replicas based on CPU utilization.
Prerequisites
- A running Kubernetes cluster (kubeadm, minikube, GKE, EKS, etc.).
- kubectl installed and configured to communicate with your cluster.
- Basic knowledge of Kubernetes concepts.
Step 1: Deploy a Simple Web Application
First, we will create a simple web application that we can scale. For this lab, we will use a sample NGINX application.
- Create a Deployment: Create a file named nginx-deployment.yaml with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx resources: requests: cpu: "200m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi" ports: - containerPort: 80 |
- Apply the Deployment: Run the following command to create the deployment:
1 |
kubectl apply -f nginx-deployment.yaml |
- Verify the Deployment: Check if the deployment is up and running:
1 |
kubectl get deployments |
Step 2: Expose the Application
Next, we will expose our application using a Kubernetes Service so we can access it.
- Create a Service: Create a file named nginx-service.yaml with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: NodePort |
- Apply the Service: Run the following command:
1 |
kubectl apply -f nginx-service.yaml |
- Get the Service Information: After a few moments, you can retrieve the service details:
1 |
kubectl get services |
Step 3: Install the Metrics Server
HPA relies on the Metrics Server to gather metrics. If you have not installed it yet, you can do so with the following commands:
- Install Metrics Server: Run the following command to install the Metrics Server:
1 |
kubectl apply -f <a href="https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml">https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml</a> |
- Verify the Metrics Server: Check that the Metrics Server is running:
1 |
kubectl get pods -n kube-system |
Step 4: Create the Horizontal Pod Autoscaler
Now that we have our application running and the Metrics Server in place, we can create the HPA.
- Create an HPA Configuration: Create a file named hpa.yaml with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx-deployment minReplicas: 2 maxReplicas: 10 targetCPUUtilizationPercentage: 70 |
- Apply the HPA Configuration: Run the following command:
1 |
kubectl apply -f hpa.yaml |
- Verify HPA Creation: Check if the HPA is created successfully:
1 |
kubectl get hpa |
Step 5: Simulate Load to Test Autoscaling
To see HPA in action, we need to simulate some load on our application.
- Install Apache Benchmark (or any other load testing tool):
For example, if you are using a local machine with apt-get:
1 |
sudo apt-get install apache2-utils |
- Run Load Test: Replace <EXTERNAL_IP> with the actual IP address of your NGINX service obtained earlier:
1 |
ab -n [total_requests] -c [concurrent_requests] <a href="http://[hostname]:[port]/%5bpath">http://[hostname]:[port]/[path</a>] |
E.g. ab -n 100000 -c 300 http://3.129.8.95:30080/
Step 6: Monitor the Autoscaling
- Check the HPA Status: After running the load test, monitor the HPA status:
1 |
kubectl get hpa -w |
You should see the number of replicas scaling up based on CPU usage.
Conclusion
In this blog post, we have explored how to leverage the Horizontal Pod Autoscaler in Kubernetes to dynamically scale applications based on demand. By deploying a simple NGINX application and configuring HPA, you can automatically adjust the number of replicas to maintain optimal performance under varying loads.
With HPA, Kubernetes offers a robust solution for managing application scalability, ensuring your applications remain responsive while optimizing resource usage and cost. As you delve deeper into Kubernetes, consider exploring more advanced features like custom metrics or external metrics providers for further scalability enhancements. Happy scaling!
Freedom Month Sale — Discounts That Set You Free!
- Up to 80% OFF AWS Courses
- Up to 30% OFF Microsoft Certs
About CloudThat
CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

WRITTEN BY Komal Singh
Comments