AI/ML, AWS, Cloud Computing, DevOps

6 Mins Read

Deploying GenAI Models on Amazon EKS with Ray Serve and Argo Workflows

Voiced by Amazon Polly

Introduction

Generative AI models are artificial intelligence models that can create various forms of data, including text, images, video, and code. They learn patterns from existing data and use that knowledge to generate the results. In this blog, we have fine-tuned a preexisting model and then deployed it to the Amazon EKS cluster with Ray Serve and Argo Workflows.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Solution Overview

We will fine-tune a stability diffusion text to image model from a hugging face using Jupyter Notebook, upload it to Amazon S3, and deploy it to Amazon EKS using Argo Workflows by exposing it via Fast API behind Ray Serve.

Infrastructure setup

Complete the steps below before proceeding with the workflow steps.

  • Amazon EKS Cluster with worker nodes and GPU nodes as model training will require GPU.
  • Install Argo workflow.
  • Install Nvidia Device Plugin. This plugin for Kubernetes is a Daemonset that allows you to automatically expose the number of GPUs in each node of the cluster.
  • Install Kube Ray Operator, this operator is used to run the RayServe cluster on Amazon EKS. Ray Serve exposes the model in the backend via Fast APIs. Deploy it via helm or manifests.
  • Generate a Hugging Face token. This token is required to download the model and data set from Hugging Face.

Steps

  • Create a secret for hugging face token with the name hf-token.
  • Create a secret with the name gitlab-creds.
  • Create a secret with the name regcred for docker-registry.
  • Create a service account (e.g., jupyter-sa) in the workflow namespace and add it to the AWS IAM role’s trust policy. The AWS IAM role must have Amazon S3 access for model upload/download and workflow tasks.
  • Click on the workflow template and create a new workflow template. Attached is the template that I have used.
  • After the workflow is completed, it will look like this:

step6

  • Run command kubectl get pods -n dogbooth to check the status of ray service. Created ray service in the dogbooth namespace.

step7

Template Description

  • I have used a DAG (Directed Acyclic Graph) alternative to steps in argo workflows.
  • In the above example, I have used four tasks.

a. git-clone: Clones the git repo, copies the necessary data inside /workingdir, and uses this data in subsequent tasks.

template

b. run-notebook: Fine-tune the model via jupyter-notebook. Used custom Jupyter image, which jupyter notebook, papermill, and rest of the dependencies installed. ipynb handles training, dependency installation, and uploading model to Amazon S3.

template2

c. docker-push: Pushes the image to the docker hub. This image will be used in RayServe manifest. Use –compressed-caching=false as Kaniko takes a snapshot after each step, causing pod memory to go high resulting in pod termination.

template3

d. ray-serve-deploy: Deploys the built image to eks cluster using ray-service.yaml manifest.

template4

  • The Git clone will run first, and run-notebook and docker push will run in parallel as they depend on the git clone. Ray-serve-deploy is dependent on all three, so it will run on last.
  • Use taint to taint the node and toleration to schedule all pods on the GPU node, as the run-notebook will require GPU for model training.
  • The volume claim template will create a pvc and pv and associate it with the workflow.

Dogbooth.ipynb

In the dogbooth.ipynb notebook Hugging face token created earlier is used for logging in to Hugging face.

dog

dog2

dog3

Ray-service.yaml

In the ray-service.yaml I have added an ingress section to access the ray dashboard via load balancer DNS or ingress URL. Add service account jupyter-sa in pod spec to download the model from Amazon S3 when the container starts.

ray

ray2

ray3

ray4

ray5

Dockerfile

This Dockerfile copies dogbooth.py(FastAPI app) and installs model inference dependencies. It is used in the docker-push task to build rayservedogboothv3 image for ray serve manifests.

docker

Dogbooth.py

In dogbooth.py, a fast API is written to serve the model, and the model is downloaded from Amazon S3 when the container starts.

dogbooth

dogbooth2

dogbooth3

dogbooth4

Key Benefits

  • Argo Workflows: Argo Workflow helps in completed CI/CD automation of the model from model training to deployment.
  • Ray Serve: Helps expose the backend model via Fast API, enabling scalable, low latency inference. It automatically handles autoscaling based on incoming request volume, reducing operational overhead.

Conclusion

Using Argo Workflows for ML training and deployment pipelines enables reproducibility, CI/CD style automation, and resource utilization, reducing manual overhead and accelerating model rollout in production environments.

Drop a query if you have any questions regarding Argo Workflows and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. Can we use Argo Workflow for automation rather than ML model CI/CD?

ANS: – Yes, you can use Argo Workflow for any automation.

2. What are the benefits of using Argo Workflow rather than GitHub Actions or GitLab CI/CD?

ANS: – Argo Workflow is designed to run directly on Kubernetes, with each step being a pod. In GitHub Actions and GitLab CI/CD, we need to install custom runners on K8s to run. Argo Worfklow is mostly suitable for long running ML tasks, while the rest are not ideal.

WRITTEN BY Suryansh Srivastava

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!