From the course: Reliability Engineering in the Cloud by Pearson
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Establishing key performance indicators
From the course: Reliability Engineering in the Cloud by Pearson
Establishing key performance indicators
Thanks, Maria. Now let's talk about establishing key performance indicators. Engineering organizations, engineering leaders, we must understand that recovering quickly is very critical to the reliability engineering practices, but more important, to make sure that we offer a customer experience that is business as usual. In other words, customers are not experiencing any issues. For that, it is crucial that we define and measure the health of our operations and our activities from our applications. There are some few sample metrics that are super critical. Number one, time to detect. When does the incident start from the point of view of our monitoring observability tools to say we detected this within x number of minutes seconds. The next one is time to engage. We've detected that something happened, but now it is super critical that our teams, our application teams who own these applications, who know exactly the ins and outs, the architecture, the designs of the applications, they…
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Learning objectives2m 17s
-
(Locked)
Defining operational excellence in CRE4m 39s
-
(Locked)
Identifying processes, people, and tools for operational excellence13m 39s
-
(Locked)
Establishing key performance indicators5m 14s
-
(Locked)
Understanding root cause analysis (RCA) and correction of error (CoE) form8m 26s
-
(Locked)
Identifying tools for operational excellence assessments5m 40s
-
(Locked)
Lesson 7 review and an exercise4m 3s
-
(Locked)
-