How to use Spot Fleet on EKS Worker Node?

A Spot Fleet is a collection, or fleet, of Spot Instances, and optionally On-Demand Instances. The Spot Fleet attempts to launch the number of Spot Instances and On-Demand Instances to meet the target capacity that you specified in the Spot Fleet request. The Spot Fleet also attempts to maintain its target capacity fleet if your Spot Instances are interrupted.

Why Spot Instances?

Spot Instances are available at up to a 90% discount compared to On-Demand prices. You can use Spot Instances for various stateless, fault-tolerant, or flexible applications such as big data, containerized workloads, CI/CD, web servers, high-performance computing (HPC), and test & development workloads. Spot Instances are typically used to supplement On-Demand Instances, where appropriate, and are not meant to handle 100% of your workload.

OnDemand, Spot Instances and Spot Instance pool

With On-Demand Instances, you pay for compute capacity by the second with no long-term commitments. You have full control over its lifecycle—you decide when to launch, stop, hibernate, start, reboot, or terminate it.

A Spot Instance is an unused EC2 instance that is available for less than the On-Demand price. Because Spot Instances enable you to request unused EC2 instances at steep discounts, you can lower your Amazon EC2 costs significantly. Your Spot Instance runs whenever capacity is available and the maximum price per hour for your request exceeds the Spot price. Spot Instances are a cost-effective choice if you can be flexible about when your applications run and if your applications can be interrupted.

A Spot Instance pool is a set of unused EC2 instances with the same instance type (for example, m5.large), operating system, Availability Zone, and network platform. When you make a Spot Fleet request, you can include multiple launch specifications, that vary by instance type, AMI, Availability Zone, or subnet. The Spot Fleet selects the Spot Instance pools that are used to fulfill the request, based on the launch specifications included in your Spot Fleet request, and the configuration of the Spot Fleet request. The Spot Instances come from the selected pools.

Creating Spot Fleet

Tag your OnDemand Instance:

Before Migration from OnDemand to Spot instance, lets tag/label our OnDemand nodes. This will help us to distinguish between both the Instance type.

Get the nodes details:

kubectl get nodes

1	kubectl get nodes

Label the nodes with OnDemand tag:

Kubectl label nodes –all ‘lifecycle=OnDemand’

1	Kubectl label nodes –all ‘lifecycle=OnDemand’

Verify if tagged properly

Kubectl get nodes --label-columns=lifecycle --selector=lifecycle=OnDemand

1	Kubectl get nodes --label-columns=lifecycle --selector=lifecycle=OnDemand

Create autoscaling group for Spot fleet:

Creating launch template from the existing OnDemand instance AMI ID.
1. While creating launch template we will use the same AMI ID which is being used by OnDemand instance.
2. Choose the primary instance type to be used for the spot instance.
3. Use same security group and IAM instance profile which is used on OnDemand worker node.
4. Use same User Data which is used on OnDemand EKS Worker Node.
5. While creating launch template we will label our spot instance like lifecycle=Ec2Spot. So, we will add extra-argument on User Data of the launch
  
  --kubelet-extra-args ' --node-labels=lifecycle=Ec2Spot'
  
  1
  
  --kubelet-extra-args ' --node-labels=lifecycle=Ec2Spot'
6. Create launch templates
Creating autoscaling group (ASG) from the newly created Launch template.
1. While creating ASG do make sure you are using same launch template which has been created recently.
2. On Instance distribution choose the percentage on spot instance you want to use.
3. Choose capacity optimized option and choose the additional instance type same as primary instance type. (cores and memory).
4. Choose the VPC and Subnets.
5. Choose Desired count, minimum count max count as per your requirement.
6. Use the same tag which is there in OnDemand autoscaling group to create new asg. (Note: this is very important)
7. Create auto scaling group.
After creation of the ASG, you will be able to see the nodes joined on the existing cluster.
1. Get the nodes details
  
  Kubectl get nodes
  
  1
  
  Kubectl get nodes
2. Verify if tagged properly
  
  Kubectl get nodes --label-columns=lifecycle --selector=lifecycle=Ec2Spot
  
  1
  
  Kubectl get nodes --label-columns=lifecycle --selector=lifecycle=Ec2Spot

Creating k8s spot termination Handler:

Demand for the spot instance varies significantly. The availability of spot instance also varies according to the availability of unused ec2 instance which might cause interruption. So, in this case we will be needing spot termination handler which will help us to send the interruption notice 2 minutes ahead to wrap up the things. so, k8s spot termination handler will help to detect and re deploy the application somewhere else in the cluster. we will deploy handler on each of the spot instances with the help of helm chart.

Search from k8s spot termination handler on repository

helm search repo

1

helm search repo
Fetch the repo from the helm chart

helm fetch --untar stable/k8s-spot-termination-handler

1

helm fetch --untar stable/k8s-spot-termination-handler
Install helm chart.

helm install k8s-spot-termination-handler k8s-spot-termination-handler/ \

1

helm install k8s-spot-termination-handler k8s-spot-termination-handler/ \

-n kube-system

1

-n kube-system

(you can also update the helm chart as per your requirements. You can send the message to if any interruption happens on the cluster in future)
Upgrading the k8s-spot-termination-handler pods to be deployed only on spot instance.Since we don’t need this pod to be created on OnDemand instances, we will upgrade this helm with extra argument (with the node label as ) so that it will deploy only on spot instance.

helm upgrade k8s-spot-termination-handler k8s-spot-termination-handler/ -- set

1

helm upgrade k8s-spot-termination-handler k8s-spot-termination-handler/ -- set

nodeSelector.lifecycle=Ec2Spot -n kube-system

1

nodeSelector.lifecycle=Ec2Spot -n kube-system
Checking the Daemon set

kubectl --namespace=kube-system get daemon sets

1

kubectl --namespace=kube-system get daemon sets

Now change the Max count for OnDemand worker node autoscaling group same as Desired count. Cordon and Drain the OnDemand Instance which will migrate all the pods to spot instances.

In this way, we can use spot fleet on EKS worker Node.

If you want to learn more about Spot Fleet and Kubernetes, check out our Website.

Please comment if you have any questions.

Reference: https://docs.aws.amazon.com

Voiced by Amazon Polly

WRITTEN BY CloudThat

CloudThat is a leading provider of cloud training and consulting services, empowering individuals and organizations to leverage the full potential of cloud computing. With a commitment to delivering cutting-edge expertise, CloudThat equips professionals with the skills needed to thrive in the digital era.