We manage AWS infrastructure for one of the top 50 most visited sites in US. This is a challenging problem because of the scale of operations and large amount of user traffic. The challenge is to keep the website up & running while ensuring that there is a consistent performance regardless of the spikes in user traffic. The architecture has been designed keeping into consideration redundancies at multiple places to have better availability.
Here are some of the design factors:
- DNS for IP resolution / balancing between multiple regions
- Multiple load balancers with health check for redundancy
- Auto-Scaled group of front facing servers in multiple availability zones for high availability
- Internal load balancers to distribute traffic to app servers
- Caching cluster for information that is accessed more often
- Configuration servers for storing shared information
- Auto-Scaled app servers with firewall configured for required security
- Combination of SQL / NoSQL database with analytics cluster on Cloud
- Simple Storage Service(S3) for storing static content with high data durability
- Notification service for push notification to mobile devices
Continuous Integration / Deployment on AWS
Continuous Integration and Deployment is one of the most sought after element of most of the projects. We have an experienced team who implemented configuration automation as well as continuous delivery for our clients using tools like Jenkins / Thoughtworks Go / Chef / Puppet providing better agility to make application lifecycle faster.
We have implemented automation for few fairly large group of cloud servers for a major online education company. It helped them speed up the entire process of configuring the servers once they boot up taking advantage of custom cookbooks & recipes. Also, every time a stable version of application is checked-in to source code management system, our CD server (ThoughtWorks Go) triggers the deployment to large fleet of nodes providing an ability to perform deployments on demand without any human intervention.
Large Scale SQL Database Deployment
Percona MySQL Cluster on AWS
We manage a dual master percona MySQL setup, which is used to store billing & payment information. With higher consistency as a given factor on relational databases, achieving higher availability, fault tolerance and designing for disaster recovery were the challenges. Leveraging Active-Passive routing available on route53, we were able to achieve a highly available & fault tolerant relational database. The cluster is designed to have better disaster recovery by leveraging AWS multiple availability zones and by having copies of data in two AWS regions.
The design factors are as follows:
- Multiple active nodes load distributed with Percona host
- 2 Master & 2 Slave each with a total of 6 machine size m2.4xlarge
- 1 Slave in different AZ, Another slave in a different Region
- Requests routed via Route 53 to address AZ / geo failover scenarios
NoSQL Database Cluster Deployment
MongoDB Cluster on AWS
NoSQL databases have always been an wise choice for social interactions data. We set up and manage a 12 node MongoDB database cluster which hosts blogging data. MongoDB hitherto takes care of quite a number of optimizations, which in turn helped us in achieving a highly available & fault tolerance database with minor tuning on the database. The fine tuned cluster has been up and running from many months with minimal maintenance activities.
Below are the details of the setup:
- MongoDB cluster configured on 15 node EC2 instances with m2.4xLarge
- Write intensive application workload
- 3 replica sets with 3 shard deployments with journaled writes
- Replica sets across Geo & Availability Zone
- Security managed through VPC & Security Groups