AWS, Cloud Computing

5 Mins Read

Experiment Real-World Application Failure With AWS Fault Injection Simulator

TABLE OF CONTENT

1. FAQs
2. Introduction to AWS Fault Injection Simulator
3. Objective and Prerequisites
4. Create an Experiment Template
5. Start the Experiment
6. Track the Experiment’s Progress
7. Verify the Experiment Results
8. Final Thoughts
9. About Cloudthat

1. FAQs:

  1. What is AWS Fault Injection Simulator?

AWS Fault Injection Simulator is a fully managed service for running fault injection experiments on AWS that makes it easier to continuously improve an application’s performance, observability, and resiliency.

  1. What is Chaos Engineering?

Chaos engineering is the process of stressing an application in testing or production environments by creating disruptive events, such as server outages or API throttling, observing how the system responds, and implementing improvements.

  1. Can FIS be integrated as part of the CICD pipeline?

Yes, by enabling FIS, you can continuously test the impact of fault actions on your software deployment process.

  1. How can I monitor the impact of a FIS experiment?

With AWS CloudWatch monitoring and dashboards, you can monitor the impact of your FIS experiment on the AWS resources. In addition, AWS CloudTrail logs can provide you with complete visibility and auditing actions that are taken in your account.

2. Introduction to AWS Fault Injection Simulator

To simplify setting up and running controlled fault injection experiments across a range of AWS services, Amazon introduced the Fault Injection Simulator so that teams can build Confidence in their application behavior. With Fault Injection Simulator, teams can quickly set up experiments using pre-built templates that generate the desired disruptions. In addition, the Fault Injection Simulator provides the controls and guardrails that teams need to run experiments in production, such as automatically rolling back or stopping the experiment if specific conditions are met. With a few clicks in the console, teams can run complex scenarios with common distributed system failures happening in parallel or building sequentially over time, enabling them to create the real-world conditions necessary to find hidden weaknesses.

3. Objective & Prerequisites:

We are going to test how our applications handle instance stop and start.

Let us experiment; we can analyze how long it will take the application to come back from a stopped state after the Instance stop and start. It will provide an insight to design your Architecture. Most organizations use this experiment during their Game Days.

Here we are going to test instance Stop and Start using AWS FIS.

First, we need to create an IAM Role that helps us run the FIS experiment. Then, once it has been done, we need to create two EC2 Instances.

4. Create an Experiment Template

  1. Go to AWS Console and select AWS FIS
  2. Click on Create experiment template
    AWS Fault Injection Simulator
  3. Enter a Description, Name and choose IAM Role which you created earlier
    AWS Fault Injection Simulator
  4. Click on Actions
    a. Choose Add action
    b. Enter a name for the action
    c. For Action type, choose aws:ec2:stop-instances.
    d. For startInstancesAfterDuration, specify 3 minutes (PT3M)
    e. Choose Save
    AWS Fault Injection Simulator
  5. For Targets, do the following:
    a. Choose Edit for the target that AWS FIS automatically created for you in the previous step
    b. For Target method, choose Resource IDs, and then choose the IDs of the two test instances
    c. For Selection mode, choose COUNT. For Number of resources, enter 1
    d. Choose Save
    AWS Fault Injection Simulator
  6. Choose Add target and do the following
    a. Enter a name for the target
    b. For Resource type, choose aws:ec2:instance
    c. For Target method, choose Resource IDs, and then choose the IDs of the two test instances
    d. For Selection mode, choose All
    e. Choose Save
    AWS Fault Injection Simulator
  7. From the Actions section, choose Add action. Do the following:
    a. For Name, enter a name for the action
    b. For Action type, choose aws:ec2:stop-instances
    c. For Start after, choose the first action that you added
    d. For Target, choose the second target that you added
    e. For startInstancesAfterDuration, specify 3 minutes (PT3M).
    f. Choose Save
    AWS Fault Injection Simulator
  8. Choose Create experiment template
    AWS Fault Injection Simulator
    After creating the Experient Template, we need to Start it

5. Start the Experiment

  1. You should be on the details page for the experiment template that you just created. Otherwise, choose Experiment templates and then select the ID of the experiment template to open the details page.
  2. Choose Actions, Start.
  3. Choose Start experiment
    AWS Fault Injection Simulator
    AWS Fault Injection Simulator

6. Track the Experiment’s Progress

  1. You should be on the details page for the experiment that you just started. Otherwise, choose Experiments and then select the ID of the experiment to open the details page.
  2. To view the state of the experiment, check state in the Details pane.
  3. When the state of the experiment is Running, go and verify it.

7. Verify the Experiment Results

  1. Go to EC2 Console,
  2. When the state of the first action changes from Pending to Running (AWS FIS console), the state of one of the target instances changes from Running to Stopped (Amazon EC2 console)
    AWS Fault Injection Simulator
    AWS Fault Injection Simulator
    a. After three minutes, the state of the first action changes to Completed, the state of the second action changes to Running, and the state of the other target instance changes to Stopped.
    AWS Fault Injection Simulator
    AWS Fault Injection Simulator
    b. After three minutes, the state of the second action changes to Completed, the state of the target instances changes to Running, and the state of the experiment changes to Completed
    AWS Fault Injection Simulator
    AWS Fault Injection Simulator

    c. We have successfully completed our experiment. We can use this experiment to test how our applications handle instance stop and start

8. Final Thoughts

AWS FIS uses Chaos Engineering, a disciplined approach to identify failures before they become outages. By proactively testing how a system responds under stress, you can identify and fix failures. So many Organizations like Netflix, LinkedIn, Facebook, Google, Microsoft, and Amazon has Chaos Engineering Team to predict and identify potential shortcomings by breaking things on purpose.

9. About CloudThat

CloudThat is on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere to advance in their businesses.

As a pioneer in the Cloud consulting realm, CloudThat is AWS (Amazon Web Services) Advanced Consulting Partner, AWS authorized Training Partner, Microsoft Gold Partner, and Winner of the Microsoft Asia Superstar Campaign for India: 2021.

To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat’s offerings. Then, you can quickly get in touch with our highly accomplished team of experts to carry out your migration needs.

WRITTEN BY Deepak Surendran

SHARE

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!