Traceloop Launches an Observability Platform for LLM-Based Apps

Traceloop today launched its observability tool for LLM-based applications into general availability and also announced a $6.1 million seed funding round led by Sorenson Capital and Ibex Investors.
One of the problems of building on top of large language models (LLMs) is that they are probabilistic. Without the right tooling, you can’t always know if something that worked today is guaranteed to work tomorrow.
That’s the problem Traceloop co-founders Nir Gazit (CEO) and Gal Kleinman (CTO) faced when they started their first experiments with LLMs when they were both at Fiverr, where Kleinman was the machine learning (ML) group leader and Nir the chief architect (after leading a team of ML engineers at Google working on Bert).
While at Fiverr, Gazit had started using OpenTelemetry, the popular open source tool for collecting metrics, logs and traces, for another project, and started wondering if they could extend it to provide better observability for LLM applications, too.
The result of this was OpenLLMetry.
“OpenTelemetry is mainly around tracing, and when you look at agent execution, it is similar to a trace,” Gazit said. “It does one step and then the next step. So it’s the same concept. And we realized: OK, we can just take OpenTelemetry and extend it to support generative AI. Why not? It was so trivial for us, but nobody had done it.
“And so we started working on these wrappers for OpenAI, connecting them to OpenTelemetry, and figuring out how to make this all work.”
Gazit told me that when he and Kleinman came upon this idea in September 2023, he thought it was such a basic concept that he pushed his co-founder to get something out as soon as possible, before somebody else did the same. They released the first version of OpenLLMetry a month later, with the obligatory Hacker News announcement.
At its core, OpenLLMetry is a set of extensions on top of OpenTelemetry to focus on LLM applications. This means instrumentation for LLM services like OpenAI, Anthropic, Cohere, Ollama, Mistral, HuggingFace, AWS Bedrock and Google Gemini, as well as vector databases like Pinecone, Weaviate, Milvus and LanceDB, and frameworks like LangChain, CrewAI, LiteLLM and Haystack. OpenLLM.
It now also supports protocols like Anthropic’s Model Context Protocol (MCP), Google’s Agent2Agent protocol and, coming soon, the Cisco-backed Agntcy framework.
In the early days, one of the hardest challenges for the team was to keep OpenLLMetry up to date with support for the latest models. Gazit noted that Amazon, Google, IBM and others quickly started supporting the project.
But working with LLMs is also sufficiently different from working with other cloud resources that the engineering work went well beyond “just” extending OpenTelemetry.
“OpenTelemetry is a cloud observability protocol,” Gazit said. “For example, we wanted to support vision models. How do you support vision models in a telemetry protocol that is intended for sending latencies for your database or whatever?
“So it’s a huge challenge. Images can be 10 megabytes. It doesn’t fit the protocol. So you needed to make a lot of tweaks.”
As with all monitoring tools, collecting vast amounts of data is one thing; deriving actionable value from it is another. Teams need to know, for example, when a model is hallucinating or when an agent has taken an incorrect action.
The Traceloop team quickly recognized this when working with its earliest customers and decided to build what it calls an “insights layer” on top of OpenLLMetry.
Unsurprisingly, that’s also where Traceloop draws the line between the open-source project and its commercial offering. Its enterprise platform is built on top of the open source project but includes this insights layer among other enterprise features.
“If you want to understand where your model is working well and where it doesn’t, or if there’s a drift, you need something else [beyond the open source tool], which is our paid platform to analyze the data for you and give you this insights layer,” Gazit said.
Another interesting feature of the Traceloop platform is that it makes it easier for teams to experiment with models. Maybe a company wants to switch from OpenAI’s o3 to Anthropic’s Claude Sonnet 4. On the technical side, that’s just a different API call (and most companies now architect their systems to do just that), but all of the prompts have to be tweaked and the results will look different. With its new platform, Traceloop provides these users with metrics to evaluate how their systems are running with the new models.
The target audience for Traceloop is enterprise companies that are building with generative AI products, for the most part, in customer-facing applications that generate a lot of data.
“If you’re a small startup building a GenAI app, you care about quality, but don’t really care about quality. You care about product-market fit,” Gazit said. “But if you’re big, there’s a huge cost in building something that is shitty, because then your customers will not use you and will not use the product.”
So far, most of Traceloop’s users have come from the open source product. Given that OpenLLMetry is based on OpenTelemetry, which is a CNCF project, it’s maybe no surprise that Traceloop’s Gazit noted that he would like the two projects to converge and maybe merge in the near future.
For now, the company is looking ahead to bringing more enterprise customers onto its platform, which is also one of the reasons it raised this $6.1 million seed round. Besides the lead investors, Y-Combinator, Samsung NEXT, and Grand Ventures also participated in this round.