What’s Driving the Rising Cost of Observability?

Applying traditional tools to complex, fast-scaling systems is part of the problem, said Christine Yen of Honeycomb.io in this episode of The New Stack Makers.

Jan 30th, 2025 6:00am by Heather Joslyn

Featued image for: What’s Driving the Rising Cost of Observability?

Why is observability so expensive for organizations? There are a lot of culprits, but a core problem is that traditional observability tools were never meant to be used to track systems with the complexity and scale of modern cloud native architecture, according to Christine Yen, co-founder and CEO of Honeycomb.io.

Logging, monitoring and application performance monitoring (APM) tools each “had or has a fatal flaw that makes them a challenge to use in today’s cloud native, much more flexible, much larger scale world,” Yen told me in this episode of The New Stack Makers, the start of a planned TNS series on the state of observability,

Logging tools, Yen said, have traditionally been very flexible. “Many engineers, myself included, grew up on being able to put whatever you want in a log line,” she said. However, traditional logging tools were “really optimized for humans reading out the data with our eyeballs, and the scale of software that we’re operating today, especially in the cloud, simply is no longer human scale.”

As a result, “the experience using logging tools to answer questions about your software can be very slow or require a lot of overhead and maintaining indices to try and eke some additional performance out of your logging data monitoring tools.”

As for traditional monitoring tools, with their plethora of dashboards and metrics, “the fatal flaw there is optimizing for speed over flexibility,” Yen said.

That makes less sense in a constantly changing system of containerized microservices. “Instead of one monolithic app on five app servers, you might have 50 microservices on top of 500 containers and then Kubernetes pods that are cycling through those containers and restarting things,” Yen said.

“You have this explosion of complexity in your system that traditional monitoring tools struggle to be flexible enough to support, at least in a cost-effective way,” she added.

APM tools’ “fatal flaw,” she said, “was trying to be too magical.”

When applications were largely built consistently on Rails, Yen said, “vendors were able to offer magical experiences.

“You drop their agent in, they’ll just slurp up all the right telemetry that you might need to diagnose the 50 most common problems that a Rails app might run into. You log into the tool, and it has all these magical, out-of-the-box experiences that tell you you’ve got these problems, we detected them for you. And that made a lot of sense when there was this consistency in how lots of software is written.”

With monitoring, “we’re not in that world anymore,” she said. “So many engineering teams these days have polyglot storage and microservices where different components are written in different languages and different frameworks. It’s increasingly harder for a magical approach to cover the breadth that we see in modern infrastructure.”

The Influence of SREs on Observability

In addition to using traditional observability tools in environments they were never meant for, Yen said, observability costs are driven up by new demands on observability itself — driven in part by DevOps, platform engineering and especially site reliability engineering (SRE).

“I think of all of these trends as shaping where engineering teams focus, how they operate and how they think about delivering a great service,” she said. “Specifically, I’ve loved the discussion around SRE and especially the interest in SLOs — service-level objectives.

“Because there’s such a clear emphasis, with the rise of SRE, on the experience of the end user, it’s not enough for our infrastructure to be healthy. What matters is whether we are upholding our standards of delivering a great service to those end users.”

She added, “I talk to a lot of customers who are trying to adopt SRE practices and SLOs and are suffering because their observability tooling is punishing them from trying to track the things that matter to their business, trying to help them understand the customer impact or business impact of a given engineering change, and as a result, are pulling back on the very data that they need to understand impact.”

Check out the full episode to hear what Yen thinks about the opportunities AI offers to observability, and which innovations she’s most looking forward to from the OpenTelemetry project.

Heather Joslyn is editor in chief of The New Stack, with a special interest in management and careers issues that are relevant to software developers and engineers. She previously worked as editor in chief of Container Solutions, a Cloud Native consulting...