“We cannot stop a disaster, but we can arm ourselves with knowledge” – Petra Nemcova
Introduction to Warm Standby:
Any potential IT disaster such as technical failure, network disruption, or unauthorized human misconfiguration can severely impact business. Depending on the nature of the disaster, possible Disaster Recovery (DR) strategies are formed. For example, a geographical disaster such as flooding of a data center can be mitigated by employing a multi-AZ strategy, whereas an attack on the production data would require invoking a data backup strategy that fails over to data backup in another AWS Region.
In my previous blog on Data backup and Pilot Light, we discovered DR strategies such as loss of data or corruption in a single Availability Zone and data replication to a passive region that is provisioned on demand. In these scenarios, only part of the application is active in the failover region and is quickly provisioned to a full-scale production environment.
Among the four types of Disaster Recovery options provided by AWS, the process of selecting a particular DR strategy could be based on the benefits of RTO (Recovery Time Objective) and RPO (Recovery Point Objective).
Warm Standby DR Strategy:
Warm Standby DR Strategy is an extension of Pilot Light, wherein a fully functional, scaled-down copy of the production environment in another region is in standby mode. The difference between Pilot Light and Warm Standby can be difficult to understand as both include a production environment running in the DR Region with copies of the primary region assets. The difference lies in the RTO and RPO needs that can help you to choose a DR rightly.
This approach also allows us to easily perform testing or implement continuous testing to increase confidence in our ability to recover from an unforeseen disaster.
Warm Standby Failover Mechanism:
Warm Standby is an older brother of Pilot Light, as it includes the entire functionality required for the system in another Region. While in Pilot Light only the core services are active and ready for recovery, in Warm Standby, everything is running, just at a minimal capacity. It means that the load balancer, databases, gateways, subnets are ready to go at a moment’s notice. The RTO/ RPO for Warm Standby is in minutes which means that the recovery time is almost zero. During the recovery phase, in case the production system fails, the standby infrastructure is scaled up to be in line with the production environment and the DNS records are updated to route all traffic to the new AWS environment that has been provisioned minutes ago.
Auto Scaling is used to scale out the DR Region to full production capability, and the settings can be manually adjusted through AWS Management Console. This can be automated through AWS SDK or by redeploying a CloudFormation template using the new desired capacity value.
AWS Services Used:
AWS EC2 Scaling
AWS Management Console
Amazon Machine Images (AMIs)
Amazon EBS snapshots
Amazon DynamoDB Backup
Redshift, Neptune, RDS, and Aurora DB snapshots
Understanding the Pilot Light and Warm Standby Strategy can be difficult. To choose the right Disaster Recovery Strategy, you need to consider the RTO and RPO metrics as it differs for various scenarios. Warm Standby often opts for testing and monitoring purposes. Based on your disaster recovery requirements, a well-designed plan will ensure that there is minimal business impact ensuring no data loss.
Prarthit Mehta is the Business Unit Head-Cloud Consulting at CloudThat. He is an AWS ambassador and has experience delivering solutions for customers from various industry domains. He also holds working experience in AWS and Big data platforms. He is an AWS Certified Architect - Professional and a certified Microsoft Azure Solutions Architect.