AWS, Cloud Computing

4 Mins Read

Conducting Load Testing of Amazon SageMaker Endpoints with Locust

Voiced by Amazon Polly


Amazon SageMaker makes building and training machine learning models easy without worrying about managing infrastructure. After training, these models can make real-time predictions based on user input. However, we need to do load testing to ensure they work well and meet latency requirements. This article talks about how to do load testing on Amazon SageMaker using a tool called Locust.

Locust is a handy open-source tool used for load testing. Essentially, it lets you send a specified number of requests to your endpoints and provides performance feedback for your application.

To use Locust, you create a Python file, typically named This file offers various configuration options for your load tests. For example, you can define the delay between requests, specify which endpoints to call, set up functions to run at startup, and more.

By conducting load testing, you can ensure that your Amazon SageMaker endpoint runs smoothly and meets your needs for speed and responsiveness.

Our Solution Architecture


Our Amazon SageMaker endpoint seamlessly integrates with AWS Lambda through Amazon API Gateway, facilitating smooth communication and efficient request handling between the frontend client and Amazon SageMaker. This setup provides us with a seamless user experience.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Load Testing

Locust is a free load-testing tool to assess system performance under anticipated loads. It targets websites/endpoints, simulating a swarm of virtual users to measure the system’s capacity. Each virtual user’s behavior is customizable, and the testing process is monitored in real time via a web interface.

While Locust demonstrates how to conduct load testing on a large scale, Amazon SageMaker Inference Recommender offers a more efficient alternative for right-sizing the instance behind your endpoint. Unlike third-party load testing tools that require manual deployment of endpoints across different instances, Inference Recommender simplifies the process. You can simply provide an array of instance types you wish to test against, and Amazon SageMaker will deploy jobs for each instance.

We utilize Locust, an open-source load testing tool implemented in Python. Locust offers several advantages:

  1. Ease of Setup: Locust can be easily tailored to your endpoint and payload requirements with a simple Python script, as demonstrated in this post.
  2. Distributed and Scalable: Locust is event-based and utilizes events, making it ideal for testing highly concurrent workloads and simulating thousands of concurrent users. It supports high Transactions Per Second (TPS) with a single process and can scale to multiple processes and client machines for distributed load generation.
  3. Metrics and UI: Locust captures end-to-end latency as a metric, supplementing CloudWatch metrics for comprehensive testing insights. Its user-friendly UI allows you to track concurrent users, workers, and other performance metrics.

Implementation Steps

Python’s locust framework is a powerful tool in your arsenal. In this blog post, we’ll explore how to use locust for parallel execution, enabling you to perform the concurrent load testing of real-time Amazon SageMaker endpoint.

The Scenario

Let’s say you have a Python script,, that you want to run multiple times simultaneously. Running them concurrently simultaneously helps save time and make the most of your computer’s power.

Using Locust for Concurrent Execution

We will achieve this using Python’s framework. Here is a breakdown of all the steps involved:

Let’s look at simplified snapshots to demonstrate how Python scripts can be run concurrently under a main script. In this example, we will spawn multiple instances to run concurrently.

“” Script


To start Locust, have a Python file named in your directory. Open your terminal in that directory and type “locust -f“. By default, Locust looks for Once it runs successfully, open your browser and go to to access the Locust UI.


Here, you specify the number of users, the ramp-up period, the endpoint name, and runtime. After filling in these details, you can start the swarm. Once the load testing is completed, you will receive the endpoint load testing report.

Screenshots of the Locust Load Testing Report are:



Screenshots of Amazon CloudWatch Metrics created by load testing performed using Locust are:



Python’s Locust framework offers a simple and efficient way to execute tasks in parallel, reducing script execution time significantly. With subprocess, you can fully utilize your computing resources, whether working with large datasets, performing calculations, or running simulations.

By leveraging subprocess for parallel execution, you can streamline Python workflows and boost productivity, achieving substantial performance gains.

Locust provides flexibility for writing tests tailored to your use case, from simple to advanced scenarios. It offers essential statistics and a user-friendly web UI for monitoring load tests. While we have covered fundamental aspects, such as command-line options and configuration, the official documentation provides further details on advanced features like custom clients. Explore the documentation for more insights into maximizing Locust’s capabilities.

Drop a query if you have any questions regarding Locust and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery PartnerAWS Microsoft Workload PartnersAmazon EC2 Service Delivery Partner, and many more.

To get started, go through our Consultancy page and Managed Services PackageCloudThat’s offerings.


1. What is a Locust?

ANS: – Locust is a free load-testing tool to evaluate how a system performs under expected loads, particularly targeting websites/endpoints.

2. How does Locust simulate loads?

ANS: – Locust simulates loads by deploying a swarm of virtual users, known as locusts, to interact with the target, such as a website, replicating real-world usage scenarios.

3. What can I monitor during a Locust test?

ANS: – During a Locust test, you can monitor various metrics, including response times, error rates, throughput, and resource utilization, in real time through the web interface.

WRITTEN BY Aditya Kumar

Aditya Kumar works as a Research Associate at CloudThat. His expertise lies in Data Analytics. He is learning and gaining practical experience in AWS and Data Analytics. Aditya is also passionate about continuously expanding his skill set and knowledge to learn new skills. He is keen to learn new technology.



    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!