From the course: Spring Boot Observability: Deep Dive into Logging, Metrics, and Tracing

What is observability?

- [Instructor] Spring makes it easy to build and your run application in production. However, once deployed, it's crucial to have operational insight into the behavior of the system, a system architectures increasing complexity and scale. We face pressure to track conditions and respond to issues across different cloud environments. As a result, IT teams are looking for greater observability, as it is crucial for investigating and preventing situations where a system starts to deviate from its attended state. But what's observability? Observability might mean different things to different people. Some might say it's all about monitoring application on time, response time, errors, or latency. To others, it might be tracking system resources, such as available free memory and CPU utilization. The term observability was coined by Hungarian American electrical engineer, mathematician, and inventor, Rudolf E. Kalman, in 1960. Observability is defined as a measure of how well internal states of a system can be inferred from knowledge of its external outputs. Observability is basically a way of you staying ahead of the issues from the low level, such as code within the application, OS and system level, and finally, at the network level, too. But how does this compare to monitoring? According to the "Site Reliability Engineering" book by Google, your monitoring system needs to answer two simple questions: what's broken and why. Monitoring allows you to watch and understand your system state using a predefined set of metrics and logs, which led to detect a known set of failures. However, the problem with monitoring complex distributed applications is that production failures are not linear and, therefore, are difficult to predict. Whereas, monitoring brings all the value in understanding when your system is working, or if something went wrong, observability enables you to understand why. An observable system allows you to more easily navigate from the effects to the cause. It helps you find answers to questions like, what services did the request go through, and where were the performance bottlenecks? How was the execution of the request different from the expected system behavior? Why did the request fail? How did each microservice process the request? System with observability is what you want. Imagine operations person comes to you and says, "Hey, our CPU is at 100% in one of the clusters of a service." Well, that doesn't immediately tell you anything. It doesn't tell you how to fix it. Imagine if instead you get a message, "Hey, your latest deployment with this release number XY increased latency for this particular endpoint by 50%, and it's impacted 80% of customers." As a developer, you immediately know where the problem is, how important it is, where you can find it, fix it, how to reproduce it, and map the problem in your code. Keeping the idea of observability in mind let's developers a way to push a change and know how quickly the impact of that change. You have to accept that things are always going to happen in production. The goal is not to eliminate failure, the goal is to focus on recovering as quickly as possible with early and fast detection.

Contents