Azure wasn’t down on December 30.
Our assumptions were.
No alerts.
No outage banner.
Bas silence.
portal.azure.com → NXDOMAIN
intune.microsoft.com → gone
Same links worked instantly on 8.8.8.8 and 1.1.1.1.
That’s when I just knew…
…this wasn’t a cloud failure.
It was a dependency failure.
We’d hard-wired…
…Cisco Umbrella/OpenDNS like gravity.
Always there. Always safe.
Until it wasn’t.
Suddenly Microsoft Azure…
…looked “globally down”
only for teams who never mapped…
…DNS as a failure domain.
The myth that died that day:
“If Azure is up, we’re fine.”
Comedy of errors, live in prod:
• One resolver silently fails
• Control planes vanish
• CI/CD stalls
• Intune can’t phone home
• VPNs can’t resolve gateways
• Everyone blames the cloud
I’ve made this mistake myself.
DNS lived in the appendix.
Not the risk register.
That one’s on me.
Hard truth:
Most “cloud outages”
are middleware outages in reality.
What changed after December:
1️⃣ Resolver diversity — two independent paths, health-checked, failover-tested
2️⃣ DNS in DR — explicit runbooks, not tribal memory
3️⃣ Vendor-aware ops — DNS/SWG vendors = Tier-1 dependencies
4️⃣ Drift detection — compare answers, TTLs, CNAMEs before users do
The line that stuck with me:
If you don’t map it, you don’t control it.
Azure didn’t disappear.
It was selectively unreachable.
If DNS isn’t in your 2026 risk register,
December already told you what’s coming.
Save this.
Or comment if that outage changed how you think about “stable infrastructure.”
Awesome, I think a key point for MSPs is that GDAP only supports built-in roles, which creates limitations when trying to align it with Defender XDR Unified RBAC in multi-tenant scenarios.