Moss E.’s Post

Claude Opus 4.8 is out, and we've been testing it on some of our real world trading problems as we introduced here: https://bit.ly/4uINgjw This chart shows how Opus 4.8 scores on the trading internship exam we use at Optiver, across different reasoning-effort settings and relative to previous generations of Claude models. It's exciting to see the continued progress on this exam, particularly at lower effort settings. Congrats to the Anthropic team on the release.

  • chart, line chart

Have you tested it with gpt5.5?

Like
Reply

this is actually more positive than it might seem. token efficiency is what ultimately matters

Nice chart, seems like they're focusing on cost efficiency - no wonder why! Would be interesting to see compute cost of each of the models at each level (i.e. is 'low-reasoning' opus 4.8 comparable in compute use to 'high-reasoning' opus 4.7?)

Like
Reply

Not much on an improvement from medium complexity onwards. What will that curve look like in 2 Versions from now ? Do we expect significant improvements also for the harder problems ? What are current models lacking ?

Like
Reply

Great to see that improvement in models Moss. Very helpful for developers as well as the firms themselves. Could you please share those trading problems, I really wanted to solve those

See more comments

To view or add a comment, sign in

Explore content categories