Couple of months ago, I went from leading engineering teams to going hands-on and that required working a lot on AI coding agents — Claude Code, Copilot, Lovable, Replit — on real projects, daily. As someone who's built and scaled teams at tech giants, I thought I'd pick these up fast. I was wrong. The gap between using these tools and using them well is enormous. Four things changed my output quality dramatically: 𝟭. CLAUDE.md / AGENTS.md A markdown file at your project root describing your architecture, conventions, and constraints. Without it, the agent guesses. With it, it follows. The agent stopped inventing its own naming conventions overnight. Highest-ROI setup I've found. 𝟮. Task decomposition Early on I'd describe an entire feature and get something that looked done in 10 minutes. Fascinating — untill code base grew and bugs increased exponentially in generated code. Breaking features into small, testable chunks improved agent effectiveness significantly. 𝟯. Model selection I burned two days on iteration cycles before realizing I was using a fast model for a task that needed deep reasoning. The fix: reasoning models for architecture, fast models for implementation. Auto-routing isn't there yet. 𝟰. Prompt quality The most underestimated lever. 5 minutes structuring a prompt saves 30–45 minutes of back-and-forth. Every single time. The tooling is here. The craft of using it well is still emerging. #SoftwareEngineering #CodingAgents #EngineeringLeadership #DeveloperProductivity
All four of these compound around the same root problem that agents don't carry state between sessions. CLAUDE.md helps, but it's static it describes the project, not what you've actually tried and why. We've been logging Claude Code sessions to SQLite for exactly this reason. The prompt quality point especially you write a great prompt, it works, and two weeks later you're reconstructing it from memory. Capturing that reasoning as it happens changes the iteration loop significantly.
How deep are you going on task decomposition — is this just breaking features into user stories, or are you actively constraining what the agent can change in each prompt (like "only modify this file, don't touch database schema")? Curious if you're finding the agent gets more reliable as you tighten the scope, or if it's more about you being able to validate each piece faster.
Wonderful post. Another important point to add is the need to develop skills for different types of work and apply them in parallel, functioning collaboratively as a team within Claude CLI. This multi-skill, parallel workflow model forms the foundational building block of Claude CoWork, which has been commercialized recently.
I think the whole exercise is improving our clarity of purpose and organization of work. Almost similar to challenges of teams
Good observations, Shariq Hashmi. Thank you for sharing! Token consumption would make it to your next set of observations. 😊
Useful post. Thank you for sharing
Good to read your insights. And, yeah! The last line is worth taking note of.
Cisco•1K followers
4wShariq Hashmi about your last point. One issue I see with prompting at this point is that there is no indication from the model about whether it has understood exactly what you were trying to tell it. Prompting at this point feels probably a lot like coding in 1980s without IDEs. One was probably coding in the blind. Only once your code compiles you could be sure that at least syntactically your code is right. And if there was an error you would be able to figure out maybe in a reasonable time frame about where did you go wrong. With prompts, for large features, it can be very hard to reason about the correctness of the generated code, ie it does exactly what it was supposed to, neither more nor less. And once you figure out what went wrong in the generated code, comes the trickier part. Figuring out what went wrong with the prompt.