Cloud Computing

5 Mins Read

Best Practices for AWS Monitoring Solutions

In this blog, we will discuss some measures which you can take to improve the monitoring and security of your AWS Cloud infrastructure. Monitoring is entrenched in the Well-Architected Framework. We have heard about five pillars of the AWS Well-Architected Framework which are

Monitoring your AWS environment is a constant best-practice theme across the pillars as it’s only through auditing and understanding your resources that you’ll be able to use them to their fullest ability. For security reasons, it is essential to have the tools and practices in place to know your environments as we can see, data and security breaches are consistently making the headlines as hackers are using brute force.

We use tools like Prometheus and Grafana to ensure the security of our infrastructure, but I feel, implementing these tools is not enough. There could be hundreds to thousands of instances in your infrastructure, and there may be many instances that are being launched and terminated on a daily or weekly basis. So, it becomes difficult to manage these instances with such tools. If you have worked on Prometheus, you must be knowing that to get system metrics, we install node exporter on the target instance. But in a huge infrastructure with many instances you cannot install node exporter manually in every newly launched instance. That is why today we will discuss some standard practices which you can follow to tackle such problems.

  1. Create a standard custom AMI for your Organization
    Imagine you are setting up a Prometheus (monitoring and alerting tool) for your infrastructure. You have installed node exporter in all the servers. You have enabled service discovery for ec2 instances in your AWS Infrastructure. All the nodes are up, and you are very happy that you have enabled monitoring on your whole infrastructure. You have submitted the report that you are done. The next day when you come to the office, you find out that ten nodes are down as someone has launched ten new instances. Now you will realize that it is hectic to install node exporter in every newly launched instance manually. So, as a solution for that, you launch machines of different OS flavors like Red Hat, Ubuntu, etc. And install all those exporters which you are going to use, enable these services, and create Amazon machine image (AMI) of these machines. Standardize these AMIs to launch instances in your infrastructure. This makes your life easy and you will be able to monitor your infrastructure better.
  1. Create Default security group with required inbound rules
    When we install different exporters in servers to get metrics into your monitoring tool like Prometheus, we must open some specific ports in the target machines so that the tool can extract metrics from those servers. It becomes difficult to tell every person that if you are launching any instance, open these many ports for these many IPs or security groups. And if these ports are not open then there is no use of creating a custom AMI with all exporters installed as Prometheus will not be able to extract metrics from these servers because of firewall even if exporters are running in these servers. So, to overcome this challenge, create a default security group consisting of all the required inbound rules. You can attach this security group to all your instances using a simple AWS CLI command. You can also write a Lambda script which will attach this security group as soon as an instance will be created. This will automate your manual tasks, and you will be able to monitor your infrastructure more effortlessly.
  1. Add owner tags to the instances
    If you are working in a company where many people are authorized to create instances. You can write a Lambda script which will attach an owner tag to all your instances with key as owner and value as IAM username of the creator of that instance. Now, these tags will help you to easily be able to find the responsible person in case of any incident happening. In monitoring and alerting set up, we usually send alerts to notification channels like Slack and Microsoft Teams. You can use such channels for other purposes as well as send notifications of information if someone did not follow the standard practice of a predefined practice while the creation of a resource in your infrastructure. These notifications will contain information like resource name, resource id, launch time, name tag, owner tag, and any other information according to your requirement.
  1. Write Lambda functions and send notifications related to infrastructure
    In a monitoring setup, configuring alerting rules in monitoring tools is very important but that is not enough. I believe you must also monitor how the instances are being created and what all practices are being followed for ensuring security and reliability. If everyone who has access to create resources is following standards or not. For all these notifications, you can create a separate notification channel and push all such notifications on this channel. With the help of this, the operability team will have better insights into the whole infrastructure.

What all notifications can be pushed? So, the answer to this question is everything related to infrastructure which you can fetch with the help of API requests. These are a few examples,

So, in this way, you can enable as many automated notifications of monitoring your AWS infrastructure as you want and be updated about the health and security of your infrastructure.

If you want to learn more about AWS infrastructure, check out our AWS Courses.

Please comment if you have any questions.

Voiced by Amazon Polly

WRITTEN BY Saurabh Jain

Saurabh Kumar Jain is a Subject Matter Expert working with CloudThat. He has experience in AWS, Microsoft Azure, and DevOps technologies. He is specialized in cloud security and architecture design. He also holds experience on various cloud and DevOps projects based on the cloud maturity model. He is an AWS Certified Security Speciality, AWS Certified Solutions Architect - Associate, Microsoft certified Azure Administrator and a certified Terraform Associate.



  1. Sajid Akram

    Jul 8, 2020


    Its a nice work.
    Keep it up bro…

  2. Akshay Baishander

    Jul 7, 2020


    Very Informative blog saurabh.
    Keep writing more like that

  3. Kapil

    Jul 7, 2020


    Nice article

  4. Click to Comment