Claude Opus 4.8 is out, and we've been testing it on some of our real world trading problems as we introduced here: https://bit.ly/4uINgjw This chart shows how Opus 4.8 scores on the trading…

Claude Opus 4.8 is out, and we've been testing it on some of our real world trading problems as we introduced here: https://bit.ly/4uINgjw This chart shows how Opus 4.8 scores on the trading internship exam we use at Optiver, across different reasoning-effort settings and relative to previous generations of Claude models. It's exciting to see the continued progress on this exam, particularly at lower effort settings. Congrats to the Anthropic team on the release.

5 Comments

Ben Muller 8h

Have you tested it with gpt5.5?

Claudio Raimondi 1d

this is actually more positive than it might seem. token efficiency is what ultimately matters

1 Reaction

Víctor Cayetano Hernández Sánchez, graphic

Víctor Cayetano Hernández Sánchez 1d

Nice chart, seems like they're focusing on cost efficiency - no wonder why! Would be interesting to see compute cost of each of the models at each level (i.e. is 'low-reasoning' opus 4.8 comparable in compute use to 'high-reasoning' opus 4.7?)

Peter Koller 2d

Not much on an improvement from medium complexity onwards. What will that curve look like in 2 Versions from now ? Do we expect significant improvements also for the harder problems ? What are current models lacking ?

Abhigyan Tiwari 2d

Great to see that improvement in models Moss. Very helpful for developers as well as the firms themselves. Could you please share those trading problems, I really wanted to solve those

1 Reaction

See more comments

To view or add a comment, sign in

Moss E.’s Post

Explore content categories