MiniMax has released M3, an open-weight model that pairs a 1M-token context window with native multimodal input and computer use. The headline architectural change is MSA (MiniMax Sparse Attention), which the lab says cuts per-token compute at 1M context to 1/20 of the previous generation, yielding 9× faster prefill and 15× faster decode. M3 supports a toggleable thinking mode, on for agentic and long-horizon work, off for latency-sensitive completion, at the same price. Weights and a technical report will be released within 10 days. On coding benchmarks: - M3 scores 59.0% on SWE-Bench Pro, ahead of GPT-5.5 (58.6), Gemini 3.1 Pro (54.2), and the open-weight DeepSeek V4 Pro (55.4), GLM 5.1 Thinking (58.4), and Kimi K2.6 Thinking (58.6), but well behind Claude Opus 4.8 (69.2). - On Terminal-Bench 2.1 it reaches 66.0, behind GPT-5.5 (78.2), Opus 4.8 (74.6), and Gemini 3.1 Pro (70.3). M3 leads the open-weight field but GPT-5.5 and Opus 4.8 stay ahead on software-engineering tasks. MiniMax leans on long-horizon autonomy as the differentiator. In internal tests, M3 ran roughly 24 hours optimizing an FP8 GEMM CUDA kernel across 147 benchmark submissions and 1,959 tool calls, lifting Hopper peak utilization from 7.6% to 71.3% (9.4× speedup). Unlike most models, which stalled within 30 submissions, its best result came on submission 145. It also ran ~12 hours to reproduce an ICLR paper. API pricing is tiered by input length. At ≤512K input tokens it runs $0.60/M input, $2.40/M output, and $0.12/M cached reads (currently discounted 50% for 7 days to $0.30/$1.20/$0.06); above 512K it doubles to $1.20/$4.80/$0.24.
Agentic Coding Weekly
ソフトウェア開発
Subscribe to get a free weekly email covering agentic coding tool and model updates. Join 500+ agentic engineers.
概要
Get focused updates on agentic coding tools, models, and workflows. No hype, no gossip. Just what helps you ship. Written by Prashant Anand, an experienced ML Engineer (ex-Mercari) who spent the last 6 years building and deploying production ML systems. What to expect: - Weekly issues every Monday covering what's new in tools and models, one workflow to try, and the best community posts from that week. - Occasional deep dives into specific tools, patterns, and best practices. Who this is for: - Engineers using Cursor, Claude Code, Copilot, or similar tools to write code professionally and want high-signal updates and tactics. - Engineering managers and technical leaders who want to understand the current landscape well enough to guide their teams, make informed decisions, and evaluate team performance.
- ウェブサイト
-
https://www.agenticcodingweekly.com/
Agentic Coding Weeklyの外部リンク
- 業種
- ソフトウェア開発
- 会社規模
- 社員 1名
- 本社
- Tokyo
- 種類
- 非上場企業
場所
-
プライマリ
道順を表示
Tokyo、JP
Agentic Coding Weeklyの社員
アップデート
-
Agentic Coding Weeklyさんが再投稿しました
Anthropic has released Claude Opus 4.8, an incremental upgrade over Opus 4.7 at the same price. On the benchmarks: - SWE-Bench Pro: 69.2%, up from Opus 4.7's 64.3%, ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%). - Terminal-Bench 2.1: 74.6%, up from 4.7's 66.1% and above Gemini 3.1 Pro's 70.3%, but behind GPT-5.5 at 78.2% on the same Terminus-2 harness (and 83.4% on its own Codex CLI harness). For coding work the biggest change is honesty. Anthropic reports Opus 4.8 is about four times less likely than its predecessor to let flaws in its own code pass without comment, and more likely to flag uncertainty than claim progress it can't back up. Opus 4.8 defaults to "high" effort, which Anthropic says uses a similar token count to Opus 4.7's default while performing better. The "extra" level (xhigh in Claude Code) and "max" level trade more tokens for better results on hard or long-running async tasks. Pricing is unchanged from Opus 4.7 at $5 per million input tokens and $25 per million output, with fast mode at $10/$50. The model is available via the Claude API as claude-opus-4-8. Anthropic calls this a "modest but tangible" improvement.
-
-
Issue 22 is out, covering agentic coding updates from last week: Gemini 3.5 Flash, Antigravity 2.0/CLI, Composer 2.5, Qwen 3.7 Max, DeepSeek V4 Pro pricing, and coding agent aliases. Read: https://lnkd.in/gmhaz_fY
-
-
Agentic Coding Weeklyさんが再投稿しました
Agentic Coding updates from Google I/O ## Antigravity 2.0 and CLI - 2.0 is parallel agent manger similar to Claude and Codex desktop - No IDE inside 2.0 - Antigravity CLI (closed source) will replace Gemini CLI - Gemini CLI will stop working from June 18, 2026 ## Gemini 3.5 Flash - Beats Gemini Gemini 3.1 Pro on almost all benchmarks - $1.5 / $9 per million input / output tokes - 3x more expensive than 3 Flash - Pretty close 3.1 Pro pricing of $2 / $12
-
-
Issue 21 is out, covering agentic coding updates from last week: Claude Code /goal, Agent View, Codex on mobile, Claude plan changes, Grok Build CLI, and Qwen 3.5 on M4 mac. Read: https://lnkd.in/gq3FTspu
-
-
Agentic Coding Weeklyさんが再投稿しました
Codex, Claude, and OpenCode all have slightly different ways to resume, update, and run in yolo mode. This annoys me to no end. But these 12 aliases help me stay sane ``` # codex alias co='codex' alias coy='codex --yolo' alias cor='codex --yolo resume' alias cou='codex update' # claude alias cl='claude' alias cly='claude --dangerously-skip-permissions' alias clr='claude --dangerously-skip-permissions --resume' alias clu='claude update' # opencode alias oc='opencode' alias ocy='opencode --agent yolo' alias ocr='opencode --prompt "/sessions"' alias ocu='opencode upgrade' ```
-
-
Issue 20 is out, covering agentic coding updates from last week: Claude Code limits doubled, Codex CLI adds goal loop, DeepSeek V4 Pro at 75% off, and local inference on Apple Silicon. Read: https://lnkd.in/gf9-FUin
-