The AWS Outage That Took Down Half the Internet

Jay Champaneri

Published Oct 22, 2025

On October 20, 2025, Amazon Web Services (AWS) went through a major outage in its US-East-1 (Northern Virginia) region — one of its most critical data centers.

It didn’t just affect Amazon. It disrupted hundreds of platforms, including Microsoft 365, Apple Music, Alexa, McDonald’s systems, PlayStation Network, Venmo, and Fortnite. For a few hours, large parts of the internet simply stopped working.

I decided to look into what actually happened and what we, as computer science students and engineers, can learn from it.

What Really Happened

Around 12:11 AM PDT, AWS started facing major connectivity issues. By 1:26 AM, engineers found that the problem was tied to DynamoDB and DNS resolution — the systems that help AWS services find and talk to each other. A fix was deployed at 2:22 AM, but by then the impact had spread to platforms like Ring, Snapchat, and Venmo. It wasn’t until 3:00 PM that AWS declared the issue resolved.

A small glitch in DNS, the “address book” of the internet, created a domino effect because so many services depend on that single region.

What Caused It

The main cause was a DNS failure combined with a stuck internal subsystem. This broke communication across AWS’s internal network in US-East-1. Since many global applications are routed through that region, the outage spread rapidly. It’s a clear reminder that even the largest providers can face system-wide issues when too much depends on one point of failure.

The Impact

Technical: EC2, S3, and DynamoDB services became unstable. Business: Companies like McDonald’s, Microsoft, and Apple faced interruptions. Users: Millions couldn’t use everyday apps, make online payments, or access digital services for several hours.

Recommended by LinkedIn

🧠 What Actually Happened During the AWS Outage (Oct…

Jyothi Ghanta 5 months ago

Why One AWS Region Took Half the Internet Down and How…

Hernán González Buteler 5 months ago

Was the AWS Outage A Warning to Us All?

Donald C. Monistere, MBA 5 months ago

Even games like Fortnite and Pokémon GO went offline, showing how deeply connected modern systems have become.

What We Can Learn

This incident reinforced one key idea for me: failure is inevitable, but unpreparedness is optional. Some lessons that stand out:

Always design for failure and assume outages will happen.
Use multi-region redundancy to keep systems running.
Set up DNS backups and secondary failover systems.
Perform regular chaos testing to see how systems behave under stress.
Communicate clearly and transparently with users during incidents.

My Takeaway

Studying this outage gave me a more realistic view of cloud reliability. Even world-class infrastructures are not immune to failure, but great engineers design systems that recover quickly. For anyone learning cloud, backend, or DevOps, this event is worth studying — not for its failure, but for what it teaches about resilience and preparation.

You can find Case Study in my Portfolio for more details : https://jayu2236j.github.io/Portfolio-NetFlix/

Sources: Tom’s Guide (Oct 20, 2025), Business Insider, ThousandEyes, AWS Official Status Page

#AWS #CloudComputing #Reliability #DevOps #SystemDesign #OutageAnalysis #Engineering #CaseStudy

To view or add a comment, sign in

The AWS Outage That Took Down Half the Internet

Jay Champaneri

What Really Happened

What Caused It

The Impact

Recommended by LinkedIn

What We Can Learn

My Takeaway

More articles by Jay Champaneri

Others also viewed

What time is it? Ask AWS

AWS outage shows how even a reliable supplier can have incidents with a global impact

Was it AWS's fault, or ours? Let's talk about the "Default" trap!

AWS Outage: When the Internet Fell Like Dominoes

The $600M Lesson: AWS and X Failed the Same Way

The 4-Minute Fix That Saved Amazon’s Cloud Empire

AWS Outage prevented by SDN/NFV/MANO/Policys?

When Reliability Meets Reality: Lessons from the AWS Outage

What Today's AWS Outage Revealed About Our Dependencies

AWS Direct Connect: A Deeper Dive

Explore content categories

What Really Happened

What Caused It

The Impact

Recommended by LinkedIn

What We Can Learn

My Takeaway

More articles by Jay Champaneri

A Look Into the Microsoft Midnight Blizzard Incident

Exploring ChatGPT Agent Mode

Comet vs Atlas: A New Phase of AI Browsers

Alaska Airlines Outage: When Technology Grounds Flights

ChatGPT Atlas: A New Way to Browse the Web

Others also viewed

What time is it? Ask AWS

AWS outage shows how even a reliable supplier can have incidents with a global impact

Was it AWS's fault, or ours? Let's talk about the "Default" trap!

AWS Outage: When the Internet Fell Like Dominoes

The $600M Lesson: AWS and X Failed the Same Way

The 4-Minute Fix That Saved Amazon’s Cloud Empire

AWS Outage prevented by SDN/NFV/MANO/Policys?

When Reliability Meets Reality: Lessons from the AWS Outage

What Today's AWS Outage Revealed About Our Dependencies

AWS Direct Connect: A Deeper Dive

Explore content categories