The software engineering abilities of LLMs are accelerating at a phenomenal pace. Anthropic have recently released Opus 4.6. As a demonstration of the capabilities of Opus 4.6 Anthropic have today published an article and video on a Claude Code Agent Team building a GNU C compiler from scratch that can target multiple architectures (x86-64, ARM, RISC-V) with the end result passing the GCC Torture Test Suite. This did cost $20,000 in token API calls running 2000 concurrent Claude Code sessions. As the author of the article concludes, nobody expected to LLMs to be capable of doing anything close to this in 2026 and, quote, "we’re entering a new world which will require new strategies to navigate safely". I agree https://lnkd.in/erBmxD6m
Anthropic's Opus 4.6 Demonstrates Rapid Advancements in LLM Software Engineering
More Relevant Posts
-
#Anthropic researcher Nicholas Carlini conducted an experiment using 16 parallel #Claude Opus 4.6 agents that coordinated through a shared Git repository. The outcome was a 100,000-line Rust-based C compiler capable of compiling a bootable Linux kernel across x86, ARM, and RISC-V architectures. Over two weeks and with a budget of $20,000, the experiment achieved an impressive 99% pass rate on the GCC torture test suite. However, it encountered a practical limit around 100,000 lines, where new fixes often disrupted existing functionality, indicating current constraints for autonomous agentic coding. Carlini highlighted that substantial human scaffolding was necessary, which included custom test harnesses, context-aware output filtering, and time-boxing. This raises important concerns regarding the deployment of AI-generated software by developers who have not personally verified the code and suggests that coding still needs to be checked, refined and guided by the human element for now. #AgenticAI https://lnkd.in/gJch8q8c
To view or add a comment, sign in
-
Building a C compiler with a team of parallel Claudes We (Anthropic) tasked Opus 4.6 using agent teams to build a C Compiler, and then (mostly) walked away. Here's what it taught us about the future of autonomous software development. https://lnkd.in/edEBjxdP
To view or add a comment, sign in
-
In a recent engineering experiment, Anthropic used 16 Claude Opus 4.6 AI agents working in parallel on a shared codebase to autonomously build a Rust-based C compiler from scratch — producing ~100,000 lines of code over ~2,000 sessions at a cost of about $20,000. The result can compile major software like the Linux 6.9 kernel on x86, ARM, and RISC-V, passes most standard compiler test suites, and even runs Doom. This experiment is a milestone in agent-based software development, demonstrating how multiple AI instances can collaboratively tackle complex engineering tasks with minimal human intervention — while also highlighting the challenges and design lessons of scaling autonomous AI workflows. #Anthropic #AI #AgentTeams #Claude #MachineLearning #SoftwareEngineering #AutonomousAI #AIinDev #Innovation #FutureOfWork #CompilerTechnology https://lnkd.in/gXh8rnVC
To view or add a comment, sign in
-
"AI can't do real engineering work." Yeah, I used to think that too. Then Anthropic let 16 AI agents loose on a codebase for two weeks with basically zero supervision. They came back with a 100,000-line C compiler that builds the Linux kernel. A real compiler. x86, ARM, RISC-V. Here's what blew my mind: The agents coordinated with each other. Picked up tasks. Resolved merge conflicts. Wrote their own documentation so the next agent knew what was going on. ~2,000 coding sessions. $20K in API costs. And the humans mostly just... watched. The wild part? The bottleneck wasn't the AI. It was setting up good enough tests so the agents actually knew when they were done. Once that was in place, they just kept grinding. I'm not saying developers are obsolete. That's not the point. But if you can spin up 16 tireless agents and point them at a hard problem for two weeks straight? That changes things. What would you build with that kind of setup? Read more: https://lnkd.in/eMtQSKme
To view or add a comment, sign in
-
Chris Lattner, the creator of clang, LLVM, swift, and mojo's take on the lessons from Claude's C compiler. A great read. https://lnkd.in/ejKPrfFT
To view or add a comment, sign in
-
Stunning - $20,000 and 2 weeks vs $2,000,000 - $3,000,000 and multiple years. Anyone who thinks AI is hype.....you do so at your own peril. Anthropic just published a genuinely eye-opening benchmark: 16 Claude Opus 4.6 agents were set loose (with a harness + tests) to build a Rust-based C compiler from scratch—and in ~two weeks it produced ~100,000 lines of code that can build a bootable Linux 6.9 (x86/ARM/RISC-V) and hits ~99% on major compiler test suites (including the GCC torture tests). What makes this a landmark isn’t just “AI is fast.” It’s how the work is done: months/years of sequential human effort get replaced by a burst of parallel compute—and the bottleneck shifts to what I’d call the burden of coordination: harness design, task decomposition, test quality, and verification. A quick, back-of-the-napkin comparison (order-of-magnitude) Human path (systems/compilers are a rare skill set): 👉 Team: ~5 senior systems/compiler engineers 👉 Timeline: ~18–24 months (and lots of coordination + review cycles) 👉 Cost: even using U.S. salary benchmarks for senior compiler engineering roles, you’re quickly in the multi-million range once you include overhead. Agent-team path (what Anthropic reported): 👉 “Team”: 16 parallel agents 👉 Timeline: ~2 weeks 👉 Spend: ~$20,000 in API costs, plus ~2 weeks of expert human time to build/maintain the harness and keep the agents oriented. If you do the crude math on the artifact itself (~100k LoC), the economics look absurdly different. But the most important part is the fine print. The fine print (this is where reality lives) It “cheats” in one crucial place. The compiler doesn’t implement the 16-bit x86 real-mode codegen needed to boot Linux, so it calls out to GCC for that phase—because the generated output couldn’t meet the tight 32k constraint. Performance + maintainability are the next mountain. Anthropic notes the compiler’s generated code is less efficient than GCC even with GCC optimizations off, and the Rust code quality is “reasonable” but not what an expert Rust engineer would produce. 💡 Translation: it’s an impressive prototype, but turning it into a long-lived production asset likely requires serious human refactoring and ongoing verification. 🎯 My takeaway We’re watching a shift from “engineering = writing code” to “engineering = designing constraints + tests + review systems that make autonomous code-writing safe and correct.” The scarce skill becomes coordination and verification, not keystrokes. https://lnkd.in/gVuTeMkw Rich Stuppy Brad Frazer Aaron Brinton Paul Wilch Noah Riley Kevin Rank, MBA Michael Magalsky Jeff Stucker
The $20,000 "SaaSpocalypse" - Claude Opus 4.6 Builds a C Compiler
https://www.youtube.com/
To view or add a comment, sign in
-
Very interesting read - Building a C compiler with a team of parallel Claudes. Written by Nicholas Carlini, a researcher. https://lnkd.in/ga3trswG
To view or add a comment, sign in
-
⚡ Go Internals Daily #10 – Compiler Optimizations The Go compiler does more than just translate code — it applies a series of optimizations to make programs faster and more efficient. 🔑 Key points: Inlining: Small functions are expanded directly into the caller to reduce call overhead. Escape Analysis: Determines stack vs heap allocation (covered in Day 3). Bounds Check Elimination: Removes unnecessary slice/array bounds checks when proven safe. Dead Code Elimination: Removes code paths that can never be executed. SSA (Static Single Assignment): Go’s compiler uses SSA form for better optimization opportunities. Loop Optimizations: Simplifies constant expressions and reduces redundant calculations inside loops. Example: func add(a, b int) int { return a + b } // May be inlined by the compiler Takeaway: Compiler optimizations are why Go programs often run faster than expected without manual tuning. Knowing these internals helps you write code that the compiler can optimize effectively. 💬 Have you ever checked compiler optimization decisions using go build -gcflags? #GoInternalsDaily #Golang #Compiler #Optimization #LearningInPublic
To view or add a comment, sign in
-
-
🚀 Contributed to the Rust Compiler (rustc) I recently fixed an Internal Compiler Error (ICE) in the Rust compiler related to trait diagnostic handling under the new -Znext-solver. 🔴 The Problem When compiling certain trait bounds like <f32 as From<<T as Iterator>::Item>>::from;, the compiler would panic with index out of bounds: the len is 1 but the index is 1. The root cause was in fulfillment_errors.rs, where the diagnostic logic assumed that trait_ref.args always contained at least two elements and unconditionally accessed type_at(1). Under the new trait solver (-Znext-solver=globally), this invariant did not always hold, leading to an out-of-bounds panic during error reporting. 🛠 The Fix I added a defensive check before accessing the second generic argument to ensure the compiler no longer assumes the old invariant. I also introduced a UI regression test to prevent future regressions. ✅ The PR has been merged into rust-lang/rust. Contributing at the compiler level was an incredible learning experience in large-scale systems engineering and diagnostic robustness. #Rust #OpenSource #CompilerEngineering #SystemsProgramming
To view or add a comment, sign in
-
-
🚀 Connecting Karel to LLVM — and unlocking native interoperability After quite some work, I finally connected my Karel compiler to LLVM. This is a huge milestone for the project because LLVM gives us not only a mature optimization and code generation backend, but also something incredibly powerful: a Foreign Function Interface (FFI). With this, Karel programs can dynamically load and call C functions at runtime. That means: ✅ Native performance through LLVM ✅ Access to existing C ecosystems and libraries ✅ Runtime extensibility without recompiling the compiler ✅ A foundation for future language features and tooling For a small experimental language, standing on the shoulders of LLVM changes everything. Instead of reinventing optimization, register allocation, or platform-specific code generation, the compiler can focus on language semantics and developer experience. The FFI support is especially exciting because it turns Karel into something more than a toy language — it becomes a platform that can integrate with real systems. Next steps will include improving the runtime layer, adding better type mapping for external calls, and exploring JIT scenarios. If you’re interested in compilers, LLVM, or language design, I’d love feedback. GitHub: https://lnkd.in/dW_7Gdti #llvm #compiler #programminglanguages #systemsprogramming #opensource #fanuc
To view or add a comment, sign in
I am sorry , but don't you think that building a compiler is something which should already be in its training data. Coz as cse students, all of us had to build a mini compiler during our course work, and there is plethra of content on internet regarding the same. They even passed the entire gcc to it , to take reference from. So maybe the earlier evaluation problem like building a os from scratch without using chromium would be an much better eval task.