AIP Evolve — our new product for making agents more efficient and cost effective. See how Chad and Colton used it to autonomously swap models, tune prompts, validate outputs, and find structured… | Palantir Technologies

View organization page for Palantir Technologies

656,354 followers

AIP Evolve — our new product for making agents more efficient and cost effective. See how Chad and Colton used it to autonomously swap models, tune prompts, validate outputs, and find structured ontology data that eliminated 2 LLM calls; cutting compute costs while improving accuracy and reliability in production.

16 Comments

Transcript

And I can also choose, you know, what level of divergence am I OK with? Do I really need like an exact match on the outputs? Am I looking for just kind of semantic equivalents? Maybe things can vary a little bit in phrasing or do I kind of want to let evolve decide and and kind of have a best effort sort of thing? So that's a validation strategy. The next step here is then OK, thinking another step forward, it's like what changes do I want to allow Evolve to take as I kind of run this optimization? So for example, I'm pretty confident like we're going to get some leverage out of a model swap, so I can keep that selected here. And I'm also pretty sure that like when I do swap models, I'm going to inevitably need to tune the prompt a bit because every model has its own quirks. Evolved can do that, but I can also consider other things, maybe more broader changes and we can get into that into a little in a little bit. That architectural changes like extracting deterministic logic out of LM usage or even like leaning more on the ontology. And then my, my enterprises domain sort of model or kind of thinking more about how I've designed my agent, tuning how we've set up tools and function calls and and restructuring those. And then on top of that, we can add constraints about what's the budget we have for this optimization. So in this example, I'm going to say 5 iterations is, is fine and then I'm prompted to kind of review all my choices. And I can click this big purple evolve button at the bottom and that will kind of kick off this this optimization. Very cool. End results, we see a cut compute cost by 97% and that was just by swapping from GPT 5.1 to 5.4 nano Makes sense, right? Model swaps generally like this. If we're swapping to a really, really small model, we're going to imagine the compute cost will be pretty huge difference there. It improved on latency. Makes sense. Smaller model. Interestingly, it says it improved on quality by 7 percentage points and we can kind of get into exactly what's under like that kind of that thingy there So we can see yeah, we made that model swap. Here's how it selected test cases. So it divided all of these different rod scenarios into different category.

Palantir Technologies 2d

Watch the full demo here: https://youtu.be/p0pjtkg1ny4

16 Reactions

Christof Schumann 15h

Chad and Colton’s technical breakthrough here addresses the exact silent killer currently stalling AI scaling in production: variable compute risk and unpredictable token consumption. While the broader market is still treating LLM deployment as an experimental playground, the real bottleneck for CFOs and risk officers is cost predictability. When an agent can autonomously optimize its own infrastructure—swapping models and cleaning up the data pipeline to strip out redundant LLM calls—it moves AI from an unpredictable R&D line-item to a deterministic, commercially viable operational asset. Stripping out technical inefficiency is elite engineering. But the true business byproduct here is operational margin protection and algorithmic governance. Phenomenal execution, Palantir team

WorkflowWiz.ca 10h

This is where agentic AI gets real: cost efficiency, reliability, and enterprise-grade integration. The ability to connect agents to governed data and workflows is what will separate demos from durable operating capability.

Michael Ellerbeck 2d

I love the everiterative of palantir

Shawn Bullock 1d

What’s interesting isn’t that they improved the agents. It’s that they removed work the agents no longer needed to do. Most people assume progress comes from adding more intelligence, more models, more prompts, and more computation. But mature systems often move in the opposite direction. As understanding improves, complexity can collapse. Fewer calls. Fewer translations. Fewer opportunities for drift. The real cost in many AI systems isn’t computation. It’s repeated interpretation. The organizations that win may not be the ones with the most AI. They may be the ones that need the least AI to reach the same outcome because their signal, ontology, and coordination layers are already aligned. That’s not just a compute problem. It’s a continuity problem.

Mustafa ÖZTÜRK 2d

At scale, the challenge is not model selection. It is maintaining execution consistency while models, prompts, and workflows continuously evolve.

1 Reaction

Leonardo Sampaio, MSc 1d

We’re entering the next phase of enterprise AI. Building agents is becoming easier; operating them reliably, efficiently and at scale is becoming the real differentiator. Evaluation, governance and cost optimization will be critical capabilities for every AI-native organization.

Ruben Ayala 1d

The interesting shift is not autonomous model selection or prompt tuning. It is the emergence of systems that can redesign parts of their own decision architecture in pursuit of operational objectives. At that point, the central question becomes how change is governed, validated, and constrained over time—not just whether performance improves.

Doni Cahyono

20h

Bravo....need to test it

Giuseppe Gattullo 1d

See more comments

To view or add a comment, sign in

More from this author

Ready, Set, Build with the NHS Federated Data Platform

Connecting Agents to Decisions

The Technological Republic, in brief.

Explore content categories