DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Reverse Proxy en Docker con Nginx y SSL automático

Reverse Proxy en Docker con Nginx y SSL automático

7 min read
How to reduce on-call friction using AI Voice Agent

How to reduce on-call friction using AI Voice Agent

4 min read
The Hidden Currency of Tech Leadership: The Resilience Loop

The Hidden Currency of Tech Leadership: The Resilience Loop

1 min read
Building an Air-gapped Hardened Kubernetes Cluster with Kubespray

Building an Air-gapped Hardened Kubernetes Cluster with Kubespray

3 min read
Секретное оружие для работы с сотнями серверов

Секретное оружие для работы с сотнями серверов

1 min read
The Samurai Server: Why "Heroic" Systems Always Die

The Samurai Server: Why "Heroic" Systems Always Die

4 min read
End-to-End DevSecOps Project (Movies Finder)

End-to-End DevSecOps Project (Movies Finder)

2 min read
AWS Multi-Account Guardrails: A Complete Blueprint for Secure, Automated Cloud Governance

AWS Multi-Account Guardrails: A Complete Blueprint for Secure, Automated Cloud Governance

9 min read
What Engineers Can Learn From the Cloudflare Outage (November 2025)

What Engineers Can Learn From the Cloudflare Outage (November 2025)

4 min read
Vendor Tools & Reliability — Lessons from the 2025 Cloud Outages

Vendor Tools & Reliability — Lessons from the 2025 Cloud Outages

3 min read
How to Cut AWS Costs and Maintain Reliability Without a FinOps Team

How to Cut AWS Costs and Maintain Reliability Without a FinOps Team

3 min read
The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

The Hidden Failure Pattern Behind the AWS, Azure and Cloudflare Outages of 2025

3 min read
Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

4 min read
Map a Kubernetes cluster with one command

Map a Kubernetes cluster with one command

1 min read
After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

After the Google SRE Interview: Deconstructing the 'Hire' vs. 'No Hire' Debrief

3 min read
Building AI SRE: Our journey

Building AI SRE: Our journey

4 min read
The Hidden Cost of Adding Just One More Feature

The Hidden Cost of Adding Just One More Feature

1
5 min read
StatusGator Alternative in 2025: Why IT Managers Pick IsDown

StatusGator Alternative in 2025: Why IT Managers Pick IsDown

14 min read
Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

Celery + SQS: Stop Broken Workers from Monopolizing Your Queue with Circuit Breakers

2 min read
From Signals to Reliability: SLOs, Runbooks and Post-Mortems

From Signals to Reliability: SLOs, Runbooks and Post-Mortems

13 min read
The DynamoDB DNS Race Condition That Broke The Internet (And Why Your Self-Healing Systems Might Be Suicide-Bots)

The DynamoDB DNS Race Condition That Broke The Internet (And Why Your Self-Healing Systems Might Be Suicide-Bots)

2 min read
The Role Confusion: SRE vs Cloud vs Platform Engineer (And Why "DevOps Engineer" Misses the Point)

The Role Confusion: SRE vs Cloud vs Platform Engineer (And Why "DevOps Engineer" Misses the Point)

3
5 min read
🏗️ Building the Platform That Empowers Reliability by Design

🏗️ Building the Platform That Empowers Reliability by Design

3 min read
Modern CTO Podcast: The AI SRE Hype and How to Get it Right

Modern CTO Podcast: The AI SRE Hype and How to Get it Right

1 min read
The Complete 2026 and beyond Google SRE Interview Preparation Guide — Frameworks, Scenarios, and Roadmap

The Complete 2026 and beyond Google SRE Interview Preparation Guide — Frameworks, Scenarios, and Roadmap

4 min read
loading...