From the course: DevOps Foundations: Site Reliability Engineering
Your job as a DevOp
From the course: DevOps Foundations: Site Reliability Engineering
Your job as a DevOp
- So what is Site Reliability Engineering, and how does it fit into DevOps? - Well, like DevOps, it depends who you ask. Ben Treynor, who started the Google Site Reliability team from which the term SRE comes, says that "Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Of course, he goes on to explain their SRE teams are half software engineers and half systems engineers. The real key is the engineering. It's an automation first approach that has little tolerance for repeated manual work. - In the DevOps Fundamentals course, here in the library, James and I define Site Reliability Engineering as one of the three fundamental practice areas of DevOps, along with continuous delivery and infrastructure automation. - While continuous delivery is a DevOps approach to build engineering, and infrastructure automation is a DevOps approach to systems infrastructure, SRE is a DevOps approach to the production support part of operations. - You know, supporting a production service, on call and paging, monitoring, SLAs, performance and capacity planning. What we usually just called operations back in the day. - But, in this case, taking the approach of developing automated solutions and consciously applying continuous improvement processes to it. - From a lean perspective, many shops have become case studies in the downsides of local optimization by pushing work and responsibility for their product to other teams. The SRE approach is effectively a way to enforce an agile team approach and promote systems thinking. - While the S in SRE stands for site, indicating its origin with web software companies, it's probably better to think of it as service reliability engineering, as it's applicable to a wide array of application types. - In fact, IT old timers might be reminded, at this point, of IT service management. There's a good reason for that. It covers a lot of the same problem domains, incident management, change management, and so on. - Ernest and I talk about the relationship of ITIL and ITSM to DevOps in our DevOps Foundations course. Now is SRE just the kids nowadays rediscovering things that the grown up practitioners already knew? - No, well, okay, somewhat, but where ITSM tried to solve these problems with paperwork, SRE tries to solve them with engineering. - And throwing shade. - You don't get me, old man. - The first SREcon was in 2014, and the concept has grown wildly in popularity since then. Many large tech companies have SRE teams or positions from Amazon to Netflix. - As with everything, however, usage varies. The DevOps movement reacted strongly against calling people DevOps Engineers or creating DevOps teams because it implied that it was some third thing between development and operations instead of being a conceptual bridge between those two disciplines. - So Site Reliability Engineer is a lot more appropriate for a job title. That means someone who applies a real engineering mindset to the traditional areas of system administration. - Of course, when some companies say SRE, they just mean this is a job posting for an old school sysadmin, but with some cloud or something. But just because some people misuse terms doesn't mean the terms are invalid. - Yeah, in the end, both DevOps and SRE techniques are real needs in organizations, and the desire to have a job title that you can hire for that goes along with it. - So rather than worry too much about the term, let's delve into the very real practices that SRE teams use to efficiently support modern production applications.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.