From the course: DevOps Foundations: Site Reliability Engineering
Release engineering
From the course: DevOps Foundations: Site Reliability Engineering
Release engineering
- In our previous course here in the library, DevOps Foundations: Continuous Integration and Continuous Delivery, we discussed the nuts and bolts of a build into release pipeline. - Where that pipeline hits production, the actual point of release, that has other concerns that need to be addressed. - In his book, "The Visible Ops Handbook", Gene Kim's research showed that 80% of production issues were caused by deliberate changes. - Therefore, the planning, the controlling, the communicative changes, those are the biggest things you can do to improve your service reliability. - To accomplish this, you should be doing a release, rather than making a manual change, whenever possible. Manual changes to systems, typing at the command line, clicking in a UI, are all inherently error-prone. - Yeah, instead aim for delivering tested, reviewed, released code. - On my team, we use CloudFormation or Terraform code to create our infrastructure and make changes to it. We use Rundeck jobs to execute operational runbooks on systems. We even deploy our Datadog monitors from code, rather than set them up manually. - This lets you peer review, test, package, version, and control releases of the operational code in much of the same way that you do with application code. - And then the work you put into release engineering pays off twice. - Why is that? Well, because change approvals can be painful. They're filled with paperwork, often a lot of overhead. This is due to organizations focusing on compliance. - But if you have one streamlined release process, review can be as simple as code review and pull requests, and a product manager pulling a ticket into a ready for release state. - Dinah McNutt from Google Release Engineering presented her 10 Commandments of release engineering at the RELENG conference in 2014. - There's a link to this in the course handout. Much of this is CI/CD build advice we discuss in our previous course in depth. - Right, you know, use source control. Choose your tools wisely. Write portable, easily maintained build files. Use repeatable builds, package artifacts, define your upgrade process, provide a log. - For a release, you want to always have a clear payload as small as possible with ways to try it out via Canary and roll it back either via Deploy or Feature Flag. - That's mostly build engineering. The piece that's most in the SRE domain is the actual release, the communication in control of these changes to production. - Whether you're using a traditional timed release or a continuous delivery system, it's of critical importance to understand what code has gone live for users and when. - There are always many stakeholders in release process. - That's right, I've designed release processes in many organizations, and the technical work usually isn't the hardest part. It's designing a flow for approval of the release that meets your organization and compliance standards, and then communicating the changes from that release effectively to both internal and external stakeholders. - While change control is important, to stay agile, you should try to delegate that out as much as possible. Having someone who doesn't know anything about the change approve it, that's pretty counterproductive. - I prefer to have individual changes vetted by other engineers, usually via pull request, and then an engineering manager or product manager signs off on a release pay load. Depending on the maturity of your continuous deployment pipeline, this can be a single change or a bundled up weekly release. - This approach puts review in the right places, and it meets compliance requirements too. - In my company, we do that, and we only have a higher level of approval for fundamental network and security changes. And that combination has enabled us to achieve PCI and ISO 27001 compliance. - And then for communication. Automate that as much as possible so that it's timely. If internal teams like training and documentation teams can get automatic reports on what's changing out of the ticketing system, then they can stay on top of the changes better. - And if your user release notes can be automatically generated from text in the tickets that the product manager has already vetted, then your release notes are much more accurate. - Remember, all your users and stakeholders rely on your application. You can't just release whatever you want, whenever you want without impacting them. - That's right, continuous delivery shouldn't mean continuous disruption. People need to know what's changing and when - One great tool to do this is Feature Flags. Basically, as you make changes, you wrap them in toggles. They'll let you turn them on and off in isolation from actual code releases. This is often referred to as dark launching. - This way you can control launch timing, do A/B testing, and Canary deployments. Martin Fowler has an article on feature toggles that explains the pattern in depth. - Feature flags used to be do-it-yourself, but there are a bunch of tools and services you can use now. Petri, Togglz, and Flip are all available on GitHub. LaunchDarkly and Split.io are two services that help you implement the Feature Flags as well. - Release engineering is mainly about coming up with the best, lowest friction process you can to verify and to prove changes, and to communicate their release to production that works for you and your business.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
Release engineering5m 12s
-
(Locked)
Change management2m 55s
-
(Locked)
Self-service automation4m 46s
-
(Locked)
SLAs and SLOs5m 21s
-
(Locked)
Incident management5m 43s
-
(Locked)
Introducing postmortems3m 29s
-
(Locked)
The postmortem process4m 3s
-
(Locked)
Troubleshooting5m 58s
-
(Locked)
Performance engineering5m 36s
-
(Locked)
Capacity and scalability5m 21s
-
(Locked)
Distributed design5m 2s
-
(Locked)
Deliberate adversity3m 57s
-
-
-