In the last few months, I have explored LLM-based code generation, comparing Zero-Shot to multiple types of Agentic approaches. The approach you choose can make all the difference in the quality of the generated code. Zero-Shot vs. Agentic Approaches: What's the Difference? ⭐ Zero-Shot Code Generation is straightforward: you provide a prompt, and the LLM generates code in a single pass. This can be useful for simple tasks but often results in basic code that may miss nuances, optimizations, or specific requirements. ⭐ Agentic Approach takes it further by leveraging LLMs in an iterative loop. Here, different agents are tasked with improving the code based on specific guidelines—like performance optimization, consistency, and error handling—ensuring a higher-quality, more robust output. Let’s look at a quick Zero-Shot example, a basic file management function. Below is a simple function that appends text to a file: def append_to_file(file_path, text_to_append): try: with open(file_path, 'a') as file: file.write(text_to_append + '\n') print("Text successfully appended to the file.") except Exception as e: print(f"An error occurred: {e}") This is an OK start, but it’s basic—it lacks validation, proper error handling, thread safety, and consistency across different use cases. Using an agentic approach, we have a Developer Lead Agent that coordinates a team of agents: The Developer Agent generates code, passes it to a Code Review Agent that checks for potential issues or missing best practices, and coordinates improvements with a Performance Agent to optimize it for speed. At the same time, a Security Agent ensures it’s safe from vulnerabilities. Finally, a Team Standards Agent can refine it to adhere to team standards. This process can be iterated any number of times until the Code Review Agent has no further suggestions. The resulting code will evolve to handle multiple threads, manage file locks across processes, batch writes to reduce I/O, and align with coding standards. Through this agentic process, we move from basic functionality to a more sophisticated, production-ready solution. An agentic approach reflects how we can harness the power of LLMs iteratively, bringing human-like collaboration and review processes to code generation. It’s not just about writing code; it's about continuously improving it to meet evolving requirements, ensuring consistency, quality, and performance. How are you using LLMs in your development workflows? Let's discuss!
How LLMs Simplify Software Development
Explore top LinkedIn content from expert professionals.
Summary
Large language models (LLMs) are advanced AI tools that make software development easier by generating, reviewing, and improving code automatically. By handling repetitive tasks and helping developers troubleshoot or document their work, LLMs streamline workflows and boost productivity across many stages of software projects.
- Iterate and refine: Use LLMs to generate initial code solutions, then ask them to review, improve, and validate their outputs for higher quality and safer code.
- Streamline workflows: Combine multiple development tasks like debugging, documentation, and architecture planning into unified LLM-driven processes to reduce manual effort and save time.
- Experiment with prompts: Try different ways of phrasing requests to LLMs, breaking tasks into smaller parts or asking for multiple options, to discover smarter and more creative solutions.
-
-
I discovered I was designing my AI tools backwards. Here’s an example. This was my newsletter processing chain : reading emails, calling a newsletter processor, extracting companies, & then adding them to the CRM. This involved four different steps, costing $3.69 for every thousand newsletters processed. Before: Newsletter Processing Chain (first image) Then I created a unified newsletter tool which combined everything using the Google Agent Development Kit, Google’s framework for building production grade AI agent tools : (second image) Why is the unified newsletter tool more complicated? It includes multiple actions in a single interface (process, search, extract, validate), implements state management that tracks usage patterns & caches results, has rate limiting built in, & produces structured JSON outputs with metadata instead of plain text. But here’s the counterintuitive part : despite being more complex internally, the unified tool is simpler for the LLM to use because it provides consistent, structured outputs that are easier to parse, even though those outputs are longer. To understand the impact, we ran tests of 30 iterations per test scenario. The results show the impact of the new architecture : (third image) We were able to reduce tokens by 41% (p=0.01, statistically significant), which translated linearly into cost savings. The success rate improved by 8% (p=0.03), & we were able to hit the cache 30% of the time, which is another cost savings. While individual tools produced shorter, “cleaner” responses, they forced the LLM to work harder parsing inconsistent formats. Structured, comprehensive outputs from unified tools enabled more efficient LLM processing, despite being longer. My workflow relied on dozens of specialized Ruby tools for email, research, & task management. Each tool had its own interface, error handling, & output format. By rolling them up into meta tools, the ultimate performance is better, & there’s tremendous cost savings. You can find the complete architecture on GitHub.
-
We know LLMs can substantially improve developer productivity. But the outcomes are not consistent. An extensive research review uncovers specific lessons on how best to use LLMs to amplify developer outcomes. 💡 Leverage LLMs for Improved Productivity. LLMs enable programmers to accomplish tasks faster, with studies reporting up to a 30% reduction in task completion times for routine coding activities. In one study, users completed 20% more tasks using LLM assistance compared to manual coding alone. However, these gains vary based on task complexity and user expertise; for complex tasks, time spent understanding LLM responses can offset productivity improvements. Tailored training can help users maximize these advantages. 🧠 Encourage Prompt Experimentation for Better Outputs. LLMs respond variably to phrasing and context, with studies showing that elaborated prompts led to 50% higher response accuracy compared to single-shot queries. For instance, users who refined prompts by breaking tasks into subtasks achieved superior outputs in 68% of cases. Organizations can build libraries of optimized prompts to standardize and enhance LLM usage across teams. 🔍 Balance LLM Use with Manual Effort. A hybrid approach—blending LLM responses with manual coding—was shown to improve solution quality in 75% of observed cases. For example, users often relied on LLMs to handle repetitive debugging tasks while manually reviewing complex algorithmic code. This strategy not only reduces cognitive load but also helps maintain the accuracy and reliability of final outputs. 📊 Tailor Metrics to Evaluate Human-AI Synergy. Metrics such as task completion rates, error counts, and code review times reveal the tangible impacts of LLMs. Studies found that LLM-assisted teams completed 25% more projects with 40% fewer errors compared to traditional methods. Pre- and post-test evaluations of users' learning showed a 30% improvement in conceptual understanding when LLMs were used effectively, highlighting the need for consistent performance benchmarking. 🚧 Mitigate Risks in LLM Use for Security. LLMs can inadvertently generate insecure code, with 20% of outputs in one study containing vulnerabilities like unchecked user inputs. However, when paired with automated code review tools, error rates dropped by 35%. To reduce risks, developers should combine LLMs with rigorous testing protocols and ensure their prompts explicitly address security considerations. 💡 Rethink Learning with LLMs. While LLMs improved learning outcomes in tasks requiring code comprehension by 32%, they sometimes hindered manual coding skill development, as seen in studies where post-LLM groups performed worse in syntax-based assessments. Educators can mitigate this by integrating LLMs into assignments that focus on problem-solving while requiring manual coding for foundational skills, ensuring balanced learning trajectories. Link to paper in comments.
-
I've been using AI coding tools for a while now & it feels like every 3 months the paradigm shifts. Anyone remember putting "You are an elite software engineer..." at the beginning of your prompts or manually providing context? The latest paradigm is Agent Driven Development & here are some tips that have helped me get good at taming LLMs to generate high quality code. 1. Clear & focused prompting ❌ "Add some animations to make the UI super sleek" ✅ "Add smooth fade-in & fade out animations to the modal dialog using the motion library" Regardless of what you ask, the LLM will try to be helpful. The less it has to infer, the better your result will be. 2. Keep it simple stupid ❌ Add a new page to manage user settings, also replace the footer menu from the bottom of the page to the sidebar, right now endless scrolling is making it unreachable & also ensure the mobile view works, right now there is weird overlap ✅ Add a new page to manage user settings, ensure only editable settings can be changed. Trying to have the LLM do too many things at once is a recipe for bad code generation. One-shotting multiple tasks has a higher chance of introducing bad code. 3. Don't argue ❌ No, that's not what I wanted, I need it to use the std library, not this random package, this is the 4th time you've failed me! ✅ Instead of using package xyz, can you recreate the functionality using the standard library When the LLM fails to provide high quality code, the problem is most likely the prompt. If the initial prompt is not good, follow on prompts will just make a bigger mess. I will usually allow one follow up to try to get back on track & if it's still off base, I will undo all the changes & start over. It may seem counterintuitive, but it will save you a ton of time overall. 4. Embrace agentic coding AI coding assistants have a ton of access to different tools, can do a ton of reasoning on their own, & don't require nearly as much hand holding. You may feel like a babysitter instead of a programmer. Your role as a dev becomes much more fun when you can focus on the bigger picture and let the AI take the reigns writing the code. 5. Verify With this new ADD paradigm, a single prompt may result in many files being edited. Verify that the code generated is what you actually want. Many AI tools will now auto run tests to ensure that the code they generated is good. 6. Send options, thx I had a boss that would always ask for multiple options & often email saying "send options, thx". With agentic coding, it's easy to ask for multiple implementations of the same feature. Whether it's UI or data models asking for a 2nd or 10th opinion can spark new ideas on how to tackle the task at hand & a opportunity to learn. 7. Have fun I love coding, been doing it since I was 10. I've done OOP & functional programming, SQL & NoSQL, PHP, Go, Rust & I've never had more fun or been more creative than coding with AI. Coding is evolving, have fun & let's ship some crazy stuff!
-
I’ve been building and managing data systems at Amazon for the last 8 years. Now that AI is everywhere, the way we work as data engineers is changing fast. Here are 5 real ways I (and many in the industry) use LLMs to work smarter every day as a Senior Data Engineer: 1. Code Review and Refactoring LLMs help break down complex pull requests into simple summaries, making it easier to review changes across big codebases. They can also identify anti-patterns in PySpark, SQL, and Airflow code, helping you catch bugs or risky logic before it lands in prod. If you’re refactoring old code, LLMs can point out where your abstractions are weak or naming is inconsistent, so your codebase stays cleaner as it grows. 2. Debugging Data Pipelines When Spark jobs fail or SQL breaks in production, LLMs help translate ugly error logs into plain English. They can suggest troubleshooting steps or highlight what part of the pipeline to inspect next, helping you zero in on root causes faster. If you’re stuck on a recurring error, LLMs can propose code-level changes or optimizations you might have missed. 3. Documentation and Knowledge Sharing Turning notebooks, scripts, or undocumented DAGs into clear internal docs is much easier with LLMs. They can help structure your explanations, highlight the “why” behind key design choices, and make onboarding or handover notes quick to produce. Keeping platform wikis and technical documentation up to date becomes much less of a chore. 4. Data Modeling and Architecture Decisions When you’re designing schemas, deciding on partitioning, or picking between technologies (like Delta, Iceberg, or Hudi), LLMs can offer quick pros/cons, highlight trade-offs, and provide code samples. If you need to visualize a pipeline or architecture, LLMs can help you draft Mermaid or PlantUML diagrams for clearer communication with stakeholders. 5. Cross-Team Communication When collaborating with PMs, analytics, or infra teams, LLMs help you draft clear, focused updates, whether it’s a Slack message, an email, or a JIRA comment. They’re useful for summarizing complex issues, outlining next steps, or translating technical decisions into language that business partners understand. LLMs won’t replace data engineers, but they’re rapidly raising the bar for what you can deliver each week. Start by picking one recurring pain point in your workflow, then see how an LLM can speed it up. This is the new table stakes for staying sharp as a data engineer.
-
Most companies are in between Software 1.0 and 2.0. Thanks to AI, Software 3.0 has arrived. (download 72 page slidedeck below) Andrej Karpathy’s recent talk at Y Combinator's AI Startup School introduces a concept that every tech executive should sit with: Software 3.0. Where Software 1.0 was about handcrafting logic, and Software 2.0 involved neural networks as black-box classifiers, Software 3.0 treats prompts as programs and LLMs as general-purpose computing substrates. This is the next substrate shift in software. The equivalent of mainframes → PCs → cloud → AI-native systems. First, let us review the Software 3.0 paradigm's 4 areas: 1. LLMs are the new operating systems, not just tools. They are: + Utilities (serving computation in the flow of work), + Fabs (mass-producing "digital artifacts" via generative interfaces), and OSes (abstracting complexity, orchestrating context, managing memory and interface). + The right way to view this is not "plug in an LLM." It is: what would a system look like if an LLM was your system's OS? 2. We’re entering the age of partial autonomy. Karpathy makes a compelling analogy to the Iron Man suit: + Augmentation: LLMs extend human capability (autocomplete, summarization, brainstorming). + Autonomy: LLMs act independently in constrained environments (agent loops, retrieval systems, workflow automation). + This leads to the concept of Autonomy Sliders — tuning systems from fully manual to semi-automated to agentic, depending on risk tolerance, verification requirements, and task criticality. 3. The Generator-Verifier loop is the new core of development. + Instead of "write → run → debug," think: Prompt → Generate → Verify → Refine + Shorter loops, faster iterations, and critical human-in-the-loop checkpoints. Reliability comes from verification, not perfection — a major shift for teams used to deterministic systems. 4. Architect for Agents, not just Users. + Your software doesn’t just serve end users anymore — it must now serve agents. These digital workers interact with your APIs, documentation, and UIs in fundamentally different ways. + Karpathy calls for a new class of developer experience: llms.txt instead of robots.txt + Agent-readable docs, schema-first interfaces, and fine-tuned orchestration layers. Some implications for AI implementations: A. Because of Software 3.0, enterprise architecture will evolve: traditional deterministic systems alongside generative, agentic infrastructure. B. AI Governance must span both. C. Investments in data pipelines, prompt systems, and verification workflows will be as important as microservices and DevOps were in the previous era. D. Your talent model must evolve: think AI Engineers not just Prompt Engineers, blending deep system knowledge with model-first programming. E. You’ll need a new playbook for build vs. integrate: when to wrap traditional software with LLMs vs. re-architect for Software 3.0 natively? What are your thoughts about Software 3.0?
-
LLM models make a TON of mistakes, but with 1. good documentation, 2. good code review, 3. the best models available, you can flawlessly accomplish very large changes, FASTER and BETTER than a human. Here’s a real example. At Formation, we have Session Studio: our live session environment. It’s a real-time system with video, audio, chat, reactions, slides, hand-raising, polls, collaborative coding pads… the works. We recently changed the definitions of participant roles. It was a deep permission and behavior refactor across a complex, real-time surface area with dozens of flags and conditional checks. The kind of change that’s easy to partially ship and quietly break production. Here’s how I used AI to pull it off: 1. Full System Audit: Codex generated a ~1,300-line audit of the entire current state, every permission path, flag, edge case, and role interaction. 2. Proposed Redesign: Codex then wrote a second document detailing every change required to support the new role definitions. 3. Engineering Plan: Using "plan mode" first, Claude merged both documents into a structured engineering spec with clear implementation phases. 4. "Adversarial" Iteration: Claude and Codex iterated on the docs, flagging inconsistencies, ambiguities, and decisions that required human judgment. I acted as editor-in-chief, resolving tradeoffs and clarifying intent. 5. Phased Execution (8 Phases). For each phase: Claude implemented, Codex reviewed, Claude fixed... Repeat until clean, then Final Claude review. Total time: ~24 hours of async back-and-forth. The key insight: LLMs are unreliable in isolation. They’re extremely powerful inside a system of documentation, review, and phased execution.
-
I work at Airbnb where I write 99% of my code with LLMs. One thing you need to understand is they only write shit code if you let them. When you're building high quality production software, writing code is always the 𝗹𝗮𝘀𝘁 𝘀𝘁𝗲𝗽. Your first step is to understand the problem that needs to be solved. Then ideate solutions, consider alternatives, explore tradeoffs and refine your exploration into a concrete plan. Even as you implement the plan task by task you should not be coding a stream of conscious. That leads to bad code design. You should be considering the architecture of the code, abstractions and coming up with a clean way to write it. Only after all this upfront design and planning work do you then start manually typing code with your fingers. That last step is not necessary to do manually anymore. Whenever I think of coding, I immediately reach for an LLM because I use it like a power tool. A carpenter does not leave their power drill on the table when they need to screw in a bolt. Why would you not use an LLM to execute on your plan? You are in the driver's seat, providing direct technical guidance at every step. 𝗬𝗼𝘂𝗿 𝗲𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲 𝗮𝗻𝗱 𝘀𝗸𝗶𝗹𝗹 𝗹𝗲𝘃𝗲𝗹 𝗱𝗶𝗿𝗲𝗰𝘁𝗹𝘆 𝗶𝗺𝗽𝗮𝗰𝘁 𝗵𝗼𝘄 𝗴𝗼𝗼𝗱 𝘁𝗵𝗲 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗶𝘀. No, this is not slower than doing it without LLMs. You should also use LLMs as power tools for research, planning and architecture. This will get you even higher quality software than without them. It allows you to go far beyond due diligence and truly explore, analyze and refine your design fully before any single line of code is written. I use the following workflow to naturally research, design and plan the feature I want to build in the form of a conversation which then gets converted to a formal Spec that the LLM can implement task by task: 1. Explain the problem to the LLM. 2. Give it your ideas for the initial solution 3. Tell it explicitly: “Propose an approach first. Show alternatives to my solution, highlight tradeoffs. Do not write code until I approve.” 4. Review the proposal, poke holes in it, iterate 5. Tell it to write the plan to disk as a spec so you can hand off to another session later 6. Lastly, let it generate code. This is an excerpt from my article “Writing High Quality Production Code With LLMs Is A Solved Problem” full article here on LinkedIn —> https://lnkd.in/d3v-i9iK
-
Most companies overcomplicate AI implementation. I see teams making the same mistakes: jumping to complex AI solutions (agents, toolchains, orchestration) when all they need is a simple prompt. This creates bloated systems, wastes time, and becomes a maintenance nightmare. While everyone's discussing Model Context Protocol, I've been exploring another MCP: the Minimum Complexity Protocol. The framework forces teams to start simple and only escalate when necessary: Level 1: Non-LLM Solution → Would a boolean, logic or rule based system solve the problem more efficiently? Level 2: Single LLM Prompt → Start with a single, straightforward prompt to a general purpose model. Experiment with different models - some are better with particular tasks. Level 3: Preprocess Data → Preprocess your inputs. Split long documents, simplify payloads. Level 4: Divide & Conquer → Break complex tasks into multiple focused prompts where each handles one specific aspect. LLMs are usually better at handling a specific task at a time. Level 5: Few Shot Prompting → Add few-shot examples within your prompt to guide the model toward better outputs. A small number of examples can greatly increase accuracy. Level 6: Prompt Chaining → Connect multiple prompts in a predetermined sequence. The output of one prompt becomes the input for the next. Level 7: Resource Injection → Implement RAG to connect your model to relevant external knowledge bases such as APIs, databases and vector stores. Level 8: Fine Tuning → Fine tune existing models on your domain specific data when other techniques are no longer effective. Level 9 (Optional): Build Your Own Model → All else fails? Develop custom models when the business case strongly justifies the investment. Level 10: Agentic Tool Selection → LLMs determine which tools or processes to execute for a given job. The tools can recursively utilise more LLMs while accessing and updating resources. Human oversight is still recommended here. Level 11: Full Agency → Allow agents to make decisions, call tools, and access resources independently. Agents self-evaluate accuracy and iteratively operate until the goal is completed. At each level, measure accuracy via evals and establish human review protocols. The secret to successful AI implementation isn't using the most advanced technique. It's using the simplest solution that delivers the highest accuracy with the least effort. What's your experience? Are you seeing teams overcomplicate their AI implementations?
-
I burned 600 million tokens building software with LLMs in one week. Here’s what I learned: Plan like a tyrant, ship like a minimalist • Start with a markdown implementation plan. • Trim the wishlist. Ship in tiny vertical slices. • Mark sections “done,” commit, then move on. Treat Git like life support • Fresh branch per feature. • When the AI hallucinates, git reset --hard HEAD. • Re-implement clean once you find a fix. Test what users touch • Prefer end-to-end flows over unit navel-gazing. • Click through like a real user. Catch regressions before they catch you. • Don’t proceed until green. Debug with discipline • Paste exact error messages, ask for 3 causes, not 1. • Add logging first, then fix. • If stuck, reset and try another model. Optimize the AI, not just prompts • Keep instruction files in-repo (cursor.rules, agent.md, claude.md). • Download API docs locally for grounding. • Generate multiple outputs and pick the best. Build complexity out-of-band • Prototype hairy features in a clean repo, then integrate. • Maintain stable external APIs with room for internal change. • Prefer modular services over a monolith you’ll fear to open. Choose boring tech on purpose • Established frameworks win on convention and training data. • Many small files > one thousand-line blob. • Avoid giant files the models can’t reason about. Beyond code = unfair advantage • Let AI handle infra scripts, DNS, and hosting configs. • Use it for docs, marketing drafts, and design polish. • Screenshots > paragraphs when reporting UI bugs. • Voice input speeds up iteration. Improve forever • Refactor regularly once tests exist. • Ask the model to propose refactors. • Try new releases, but keep a model roster by task. Result: faster cycles, fewer rewrites, and token spend that actually buys progress. P.S. DM to discuss what you are building