Popular interview question candidates get wrong - How did you do Disaster Recovery (DR) for your AWS application? Common but average answer - I will replicate it to another region. What the interviewer is looking for is how DR strategies are chosen, and what are the different strategies. As an SA, you will be responsible for talking to the app team and coming up with an appropriate DR strategy. A great answer is - There are different DR options to choose from depending on RTO (Recovery Time Objective) and RPO (Recovery Point Objective). The available DR strategies ordered by highest to lowest RTO/RPO (and lowest to highest cost) are: - Backup and Restore - Pilot Light - Warm Standby - Multi-site Active/Active Then explain one of the DR strategies in detail. Preferably Multisite Active/Active because it’s used in most critical prod applications. Architecture attached. - The most critical part for DR is the database. In this case, we are utilizing Global Table of DynamoDB for active-active mode. If you are using SQL database like Aurora, keep in mind that Aurora Global Database is Active-Passive, but new Aurora DSQL is active-active. - Application stack is running on EC2 with Auto Scaling Group. You run minimum two EC2s in each region to keep it highly available - Load Balancers are regional service, hence we are using one load balancer in each region, distributing the traffic to that region - Route53 sends traffic to one of the two Load Balancers based on geolocation and latency - RPO/RTO is minimum in this architecture because data is constantly being replicated, and EC2s are up and running with minimum count of two in both regions. In some cases, applications make the desired count higher to keep higher number of EC2 running in the second region for lower RTO If you get this question in your interview, make sure to knock it out of the park! --- Download this and other cloud interview questions and answers (FREE): https://lnkd.in/egg_rVWH #systemdesign #aws
Great post!! All is about tradeoff! There is another most expensive which is cross cloud :-) If you really have money to rampup aws/azure. Question is who you trust better Amazon or Microsoft
Anchoring DR strategy in RTO and RPO requirements rather than defaulting to "replicate to another region" is exactly the right framework — great breakdown of the four tiers and when to apply each.
Rajdeep Saha Really clear explanation. I like how you highlighted the database part, it’s always tricky.
DR is such a broad an interesting topic. It can also be done at the Cloudfront level by setting some failover criteria.
DR in AWS is about choosing the right strategy based on RTO/RPO, not just replicating across regions.
Most candidates jump straight to "replicate to another region" without ever mentioning RTO/RPO tradeoffs and that gap alone tells the interviewer everything about your SA depth!
This is very helpful 💡 thank you 🙏 for sharing such a clear and structured explanation. It highlights well that DR is about selecting the right strategy based on RTO and RPO, not just replication.
think SQL = active-passive and don't know that's changed
Great way to capture and provide breakdown of services for SAs to design and build a resilient architecture .
Tata Consultancy Services•5K followers
2dSpot on Rajdeep Saha . I would like to share my experience. One of my previous customers had requested to set-up multi-region DR. However, when I analysed RTO/ RPO requirements along with criticality of applications. I realised that having multi region DR is not necessary for all applications and we ended up setting multi-AZ DR instead for less critical apps. This saved huge data transfer cost. So, as you precisely pointed out , it is paramount important to consider RTO/ RPO requirements before jumping on to actual DR strategy.