Agentic Coding Weeklyさんが再投稿しました
Anthropic has released Claude Opus 4.8, an incremental upgrade over Opus 4.7 at the same price. On the benchmarks: - SWE-Bench Pro: 69.2%, up from Opus 4.7's 64.3%, ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%). - Terminal-Bench 2.1: 74.6%, up from 4.7's 66.1% and above Gemini 3.1 Pro's 70.3%, but behind GPT-5.5 at 78.2% on the same Terminus-2 harness (and 83.4% on its own Codex CLI harness). For coding work the biggest change is honesty. Anthropic reports Opus 4.8 is about four times less likely than its predecessor to let flaws in its own code pass without comment, and more likely to flag uncertainty than claim progress it can't back up. Opus 4.8 defaults to "high" effort, which Anthropic says uses a similar token count to Opus 4.7's default while performing better. The "extra" level (xhigh in Claude Code) and "max" level trade more tokens for better results on hard or long-running async tasks. Pricing is unchanged from Opus 4.7 at $5 per million input tokens and $25 per million output, with fast mode at $10/$50. The model is available via the Claude API as claude-opus-4-8. Anthropic calls this a "modest but tangible" improvement.