🏗️ 𝗧𝗵𝗲 𝗖𝗮𝘀𝘂𝗮𝗹 𝗕𝘂𝗶𝗹𝗱𝗲𝗿... 𝗗𝗼𝗲𝘀 𝗧𝗼𝗼𝗹 𝗰𝗵𝗼𝗶𝗰𝗲 𝗯𝗲𝗮𝘁 𝗠𝗼𝗱𝗲𝗹 𝗰𝗵𝗼𝗶𝗰𝗲? From what I am finding through my runs is that the agent's editing primitive matters more than the model you pick. When I used opencode's 𝗲𝗱𝗶𝘁 tool on multi-line function bodies, the model had to produce every replacement line inside a JSON string — no surrounding context, explicit indentation, no guardrails. Result: lines landing at column 0, syntax errors, panicked re-edits, and token count tripling. Switched to 𝘄𝗿𝗶𝘁𝗲 (full file regeneration). Same model, same endpoint — task completed cleanly. 📕The fix wasn't a bigger model. It was a better-shaped tool. I guess use 𝗲𝗱𝗶𝘁 for single-line renames. Use 𝘄𝗿𝗶𝘁𝗲 for anything touching a function body. The format has to match how LLMs naturally generate code. Keep in mind, I am hands-off when I run this and not through interactive sessions. These learnings could be part of the bigger topic of context engineering, where you become specific and tailored to the problem space. These small optimisations can or may go a long way, according to me. 🔗 Full lessons: https://lnkd.in/gjEtWEj5 #AIEngineering #CodingAgents #LLMs #BuildingInPublic
Model choice vs editor choice in AI code generation
More Relevant Posts
-
So we burned a bunch of tokens on a boring problem. 🪙 Drift is our open source tool for binding docs to code and checking for staleness. It had a lockfile that was causing too many merge conflicts. Instead of endlessly spec’ing the perfect format, we had agents prototype and stress-test 11 serializers with a randomized property-testing harness: generate lockfiles, apply disjoint edits, run Git merges, measure spurious conflicts. Result: conflict rate dropped from ~44% to ~25%. We shipped TOML. But who cares about TOML; the real unlock is; you can just do things. - Use agents to explore N possible solutions. - Use simulation/property tests as the adversary. - Let the evidence shape the final spec. Tokens are cheaper than uncertainty. And once the scaffolding cost collapses, a whole class of “not worth investigating” engineering problems suddenly become worth solving. Full write-up: https://lnkd.in/gaYwh738
To view or add a comment, sign in
-
Laurynas Keturakis is our resident expert in "Just do things" and a case study in undoing all the assumptions we've baked into software work about areas that aren't worth exploring bc code takes too long to write He spun up an experiment to compare/contrast lockfile formats and how prone they are to merge conflicts. Something I wouldn't even stop to consider six months ago. He could run this as a little side hustle throughout the week. More Cracked than Pepper.
So we burned a bunch of tokens on a boring problem. 🪙 Drift is our open source tool for binding docs to code and checking for staleness. It had a lockfile that was causing too many merge conflicts. Instead of endlessly spec’ing the perfect format, we had agents prototype and stress-test 11 serializers with a randomized property-testing harness: generate lockfiles, apply disjoint edits, run Git merges, measure spurious conflicts. Result: conflict rate dropped from ~44% to ~25%. We shipped TOML. But who cares about TOML; the real unlock is; you can just do things. - Use agents to explore N possible solutions. - Use simulation/property tests as the adversary. - Let the evidence shape the final spec. Tokens are cheaper than uncertainty. And once the scaffolding cost collapses, a whole class of “not worth investigating” engineering problems suddenly become worth solving. Full write-up: https://lnkd.in/gaYwh738
To view or add a comment, sign in
-
Most optimizations don’t come from writing faster code. They come from stopping unnecessary work. Day 49 — Daily Engineering Practice Solved: LeetCode 121 — Best Time to Buy and Sell Stock My first instinct was brute force: checking every buy/sell pair. That works. But it keeps repeating the same comparisons again and again. Then I learned a cleaner pattern: Instead of revisiting previous values repeatedly, store the minimum value seen so far. This introduced me to the Prefix Minimum pattern. --- Pattern Learned: Prefix Minimum is a variation of Prefix Sum. Instead of storing cumulative sums, we continuously track the minimum value till the current index. --- Key Insight: A lot of optimization is simply: avoiding work you’ve already done. --- Time Complexity: O(n²) → O(n) --- Day 49. Learning how patterns simplify problems. Let’s stay consistent. 🤝 #DSA #LeetCode #Algorithms #ProblemSolving #LearningInPublic
To view or add a comment, sign in
-
-
Cursor and Chainguard just partnered to change that. When a Cursor agent resolves a dependency, it now pulls from Chainguard's signed artifact store instead of PyPI, npm, or Maven Central. That means 2,300+ container images and millions of library versions, all reproducibly built and updated within hours of upstream patches. No more treating public registries as ground truth. No more post-hoc audits on code that should have been secure from line one. Developers can ask Cursor in plain language to migrate a project. The IDE handles the configuration, credentials, and routing. This shifts supply-chain protection from something you check after the fact to something built into the autonomous coding pipeline. AI-generated code starts from trusted components, not unverified artifacts. Secure by default. Not secure by inspection. 𝐒𝐨𝐮𝐫𝐜𝐞: https://lnkd.in/dbgKaiC9
To view or add a comment, sign in
-
Looking for a fast, open-source TGG engine? Say hi to seesaw-tgg 1.0.0-rc1! Built in Rust, this engine tackles the classic problems of Triple Graph Grammars by introducing some exciting new concepts and combinations for Model-Driven Engineering. It's out now as a Release Candidate. If you are interested in Model Transformation or just looking for a solid Rust engine, give it a spin! Feedback is very welcome. Paper with all the details will follow soon! Crate: https://lnkd.in/dXvKSdQg Repo: https://lnkd.in/dea9CwWJ #Rust #TGG #ModelDrivenEngineering #OSS #RustLang
To view or add a comment, sign in
-
A lot of RAG debugging still starts in the wrong place. The answer is wrong, so the team swaps models, rewrites the system prompt, or adds stronger wording about citations. Sometimes that helps. Often it just hides the retrieval bug for another week. Arches usually separates the system into three measurements before changing generation: 1. Is the answerable information actually in the indexed corpus? 2. Did retrieval return the right chunk in the top results? 3. Did the model use the retrieved evidence correctly? Those are different failures. A PDF parser that drops tables is not a prompt problem. A chunker that separates a policy exception from the rule is not a model problem. A pure embedding search that misses exact error codes is not fixed by asking Claude to be more careful. In practice, a small hand-labeled eval set is enough to expose the shape of the problem. Take 100 real questions, attach the expected source span, and measure retrieval recall before answer quality. Then inspect the misses by hand. The pattern usually shows up quickly: bad chunk boundaries, missing metadata, weak hybrid search, or a reranker that prefers fluent but irrelevant text. The takeaway is simple: generation quality is downstream of evidence quality. If the right context is not in the prompt, the best model in the world is guessing with better grammar.
To view or add a comment, sign in
-
Your Functions Are Haunting You!! Ever copy/pasted a block of code and thought "I'll fix that later"? You just made a ghost. And the ghosts multiply. Three loops, same journey, different bodies? That’s not reuse, that’s a maintenance time bomb. Write this loop once. Alter the behaviour. Use strategy, pass a lambda. DRY is not just a rule – it’s survival. The side effects, that's what gets you. Any function that changes something outside of itself, such as an opened file, changed variable or inserted row, creates temporal coupling. In code, order silently matters. Change two innocent lines, everything falls apart, and you have no idea why. Side effects are Sith Lords. There are always two: open/close, acquire/release. Forget half and your system bleeds slowly. Two schools of thought for fixing this: Functional style — Don’t mutate. If values don't change, order can't matter. Pure functions are easier to reason about, test, and parallelize. You can't always be pure But the bias toward immutability pays off enormously . OO style hides the mess . One nice public method, and five nasty private ones on the other side of an encapsulation wall. The time coupling is taken into account. That’s not something callers see. The real superpower of OO is not inheritance. It hides side effects behind a wall. The best engineers use both. Use functional wherever possible. Use Lean OO where you must. First, get it to work. Then make it right. What do you use to manage side effects? #CleanCode #SoftwareEngineering #Programming #CodeQuality #FunctionalProgramming #OOP
To view or add a comment, sign in
-
Revid CLI is live 😍 your coding agent can now make videos from the terminal 👉 npm i -g revid-cli type one command - your agent renders the video, polls the status, and delivers the final URL turn scripts, captions, product pages, ads, and random ideas into videos without leaving your repo terminal in video out you now have ZERO reasons to not start making AI videos with your agents 💥
To view or add a comment, sign in
-
-
Vercel deleted 80% of their agent's tools and success rate jumped to 100%. Same model. Just a better harness. Everyone's been obsessing over context engineering for the last 6 months. The next thing eating that conversation is agent harnesses. A harness is everything around the model that isn't the model. Context management. Tool calls. Memory. Filesystem. Bash. The while loop that keeps the thing alive. LangChain's Vivek Trivedy put it best: "If you're not the model, you're the harness." Claude Code is a harness. Cursor is a harness. Codex is a harness. Same underlying models, wildly different agents. LangChain moved their coding agent from outside the Top 30 to Top 5 on Terminal Bench 2.0 by changing only the harness. Opus 4.6 in both runs. If you're building anything agentic in 2026, the model isn't your moat. The harness is. LangChain wrote a clean breakdown of every component a harness needs. Linking it in the comments — go read it before someone on your team starts adding their 17th tool.
To view or add a comment, sign in
-