AI/ML, AWS, Cloud Computing, DevOps

6 Mins Read

Deploying GenAI Models on Amazon EKS with Ray Serve and Argo Workflows

Voiced by Amazon Polly

Introduction

Generative AI models are artificial intelligence models that can create various forms of data, including text, images, video, and code. They learn patterns from existing data and use that knowledge to generate the results. In this blog, we have fine-tuned a preexisting model and then deployed it to the Amazon EKS cluster with Ray Serve and Argo Workflows.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Solution Overview

We will fine-tune a stability diffusion text to image model from a hugging face using Jupyter Notebook, upload it to Amazon S3, and deploy it to Amazon EKS using Argo Workflows by exposing it via Fast API behind Ray Serve.

Infrastructure setup

Complete the steps below before proceeding with the workflow steps.

  • Amazon EKS Cluster with worker nodes and GPU nodes as model training will require GPU.
  • Install Argo workflow.
  • Install Nvidia Device Plugin. This plugin for Kubernetes is a Daemonset that allows you to automatically expose the number of GPUs in each node of the cluster.
  • Install Kube Ray Operator, this operator is used to run the RayServe cluster on Amazon EKS. Ray Serve exposes the model in the backend via Fast APIs. Deploy it via helm or manifests.
  • Generate a Hugging Face token. This token is required to download the model and data set from Hugging Face.

Steps

  • Create a secret for hugging face token with the name hf-token.
  • Create a secret with the name gitlab-creds.
  • Create a secret with the name regcred for docker-registry.
  • Create a service account (e.g., jupyter-sa) in the workflow namespace and add it to the AWS IAM role’s trust policy. The AWS IAM role must have Amazon S3 access for model upload/download and workflow tasks.
  • Click on the workflow template and create a new workflow template. Attached is the template that I have used.
  • After the workflow is completed, it will look like this:

step6

  • Run command kubectl get pods -n dogbooth to check the status of ray service. Created ray service in the dogbooth namespace.

step7

Template Description

  • I have used a DAG (Directed Acyclic Graph) alternative to steps in argo workflows.
  • In the above example, I have used four tasks.

a. git-clone: Clones the git repo, copies the necessary data inside /workingdir, and uses this data in subsequent tasks.

template

b. run-notebook: Fine-tune the model via jupyter-notebook. Used custom Jupyter image, which jupyter notebook, papermill, and rest of the dependencies installed. ipynb handles training, dependency installation, and uploading model to Amazon S3.

template2

c. docker-push: Pushes the image to the docker hub. This image will be used in RayServe manifest. Use –compressed-caching=false as Kaniko takes a snapshot after each step, causing pod memory to go high resulting in pod termination.

template3

d. ray-serve-deploy: Deploys the built image to eks cluster using ray-service.yaml manifest.

template4

  • The Git clone will run first, and run-notebook and docker push will run in parallel as they depend on the git clone. Ray-serve-deploy is dependent on all three, so it will run on last.
  • Use taint to taint the node and toleration to schedule all pods on the GPU node, as the run-notebook will require GPU for model training.
  • The volume claim template will create a pvc and pv and associate it with the workflow.

Dogbooth.ipynb

In the dogbooth.ipynb notebook Hugging face token created earlier is used for logging in to Hugging face.

dog

dog2

dog3

Ray-service.yaml

In the ray-service.yaml I have added an ingress section to access the ray dashboard via load balancer DNS or ingress URL. Add service account jupyter-sa in pod spec to download the model from Amazon S3 when the container starts.

ray

ray2

ray3

ray4

ray5

Dockerfile

This Dockerfile copies dogbooth.py(FastAPI app) and installs model inference dependencies. It is used in the docker-push task to build rayservedogboothv3 image for ray serve manifests.

docker

Dogbooth.py

In dogbooth.py, a fast API is written to serve the model, and the model is downloaded from Amazon S3 when the container starts.

dogbooth

dogbooth2

dogbooth3

dogbooth4

Key Benefits

  • Argo Workflows: Argo Workflow helps in completed CI/CD automation of the model from model training to deployment.
  • Ray Serve: Helps expose the backend model via Fast API, enabling scalable, low latency inference. It automatically handles autoscaling based on incoming request volume, reducing operational overhead.

Conclusion

Using Argo Workflows for ML training and deployment pipelines enables reproducibility, CI/CD style automation, and resource utilization, reducing manual overhead and accelerating model rollout in production environments.

Drop a query if you have any questions regarding Argo Workflows and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery PartnerAWS ConfigAmazon EMR and many more.

FAQs

1. Can we use Argo Workflow for automation rather than ML model CI/CD?

ANS: – Yes, you can use Argo Workflow for any automation.

2. What are the benefits of using Argo Workflow rather than GitHub Actions or GitLab CI/CD?

ANS: – Argo Workflow is designed to run directly on Kubernetes, with each step being a pod. In GitHub Actions and GitLab CI/CD, we need to install custom runners on K8s to run. Argo Worfklow is mostly suitable for long running ML tasks, while the rest are not ideal.

WRITTEN BY Suryansh Srivastava

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!