Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times, giving it opportunities to build step by step to higher-quality output. Here, I'd like to discuss Reflection. It's relatively quick to implement, and I've seen it lead to surprising performance gains. You may have had the experience of prompting ChatGPT/Claude/Gemini, receiving unsatisfactory output, delivering critical feedback to help the LLM improve its response, and then getting a better response. What if you automate the step of delivering critical feedback, so the model automatically criticizes its own output and improves its response? This is the crux of Reflection. Take the task of asking an LLM to write code. We can prompt it to generate the desired code directly to carry out some task X. Then, we can prompt it to reflect on its own output, perhaps as follows: Here’s code intended for task X: [previously generated code] Check the code carefully for correctness, style, and efficiency, and give constructive criticism for how to improve it. Sometimes this causes the LLM to spot problems and come up with constructive suggestions. Next, we can prompt the LLM with context including (i) the previously generated code and (ii) the constructive feedback, and ask it to use the feedback to rewrite the code. This can lead to a better response. Repeating the criticism/rewrite process might yield further improvements. This self-reflection process allows the LLM to spot gaps and improve its output on a variety of tasks including producing code, writing text, and answering questions. And we can go beyond self-reflection by giving the LLM tools that help evaluate its output; for example, running its code through a few unit tests to check whether it generates correct results on test cases or searching the web to double-check text output. Then it can reflect on any errors it found and come up with ideas for improvement. Further, we can implement Reflection using a multi-agent framework. I've found it convenient to create two agents, one prompted to generate good outputs and the other prompted to give constructive criticism of the first agent's output. The resulting discussion between the two agents leads to improved responses. Reflection is a relatively basic type of agentic workflow, but I've been delighted by how much it improved my applications’ results. If you’re interested in learning more about reflection, I recommend: - Self-Refine: Iterative Refinement with Self-Feedback, by Madaan et al. (2023) - Reflexion: Language Agents with Verbal Reinforcement Learning, by Shinn et al. (2023) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, by Gou et al. (2024) [Original text: https://lnkd.in/g4bTuWtU ]
How to Use Step-by-Step Prompting in LLMs
Explore top LinkedIn content from expert professionals.
Summary
Step-by-step prompting in large language models (LLMs) means guiding the AI through a task using clear, structured instructions instead of expecting a perfect answer in one go. This approach helps the model break down complex questions, reason through each stage, and deliver more accurate, detailed responses.
- Break tasks down: Divide your request into simple, sequential steps so the model can address each part before moving to the next.
- Ask for reasoning: Prompt the model to explain its thought process or decision-making at each stage to increase clarity and reliability.
- Use feedback loops: Encourage the model to reflect on its answers or review them for improvement by adding constructive criticism or testing outputs.
-
-
STOP asking ChatGPT to "make it better". Here's how to better prompt it instead: ☑ Clearly Identify the Issue Rather than a vague “make it better,” specify the exact element that needs change. For example: "Rewrite the second paragraph so it includes three concrete examples of our product’s benefits. The tone must be formal and persuasive. Remove any informal language or redundant phrases." ☑ Divide the Task into Discrete Steps Break the overall revision into a sequence of manageable tasks. For example: "Go through my instructions, step by step. – Step 1: Summarize it in one sentence. – Step 2: Identify two specific weaknesses. – Step 3: Rewrite the text to address these weaknesses, incorporating specific data or examples." ☑ Specify the Format and Level of Detail Define exactly how the final output should look. For example: "Provide the final revised text as a numbered list where each item contains 2–3 sentences. Each item must include at least one statistical fact or concrete example, and the overall response should not exceed 250 words." ☑ Request a Chain-of-Thought Explanation Ask the model to detail its reasoning process before giving the final output. For example: "Before providing the final revised text, explain your reasoning step-by-step. Identify which parts need improvement and how your changes will enhance clarity and professionalism. Then, present the final revised version." ☑ Conditional Instructions to Enforce Compliance Add if/then conditions to ensure all requirements are met. For example: "If the revised text does not include at least two concrete examples, then add a sentence with a real-world statistic. Otherwise, finalize the response as is." ☑ Consolidate All Instructions into One Prompt Integrate all the detailed instructions into a single, comprehensive prompt. For example: "First, identify the section of the text that needs improvement and explain why it is lacking. Next, summarize the current text in one sentence and list two specific weaknesses. Then, rewrite the text to address these weaknesses, ensuring the revised version includes three concrete examples, uses a formal and persuasive tone, and is structured as a numbered list with each item containing 2–3 sentences. Each list item must include at least one statistical fact or example, and the overall response must be no longer than 250 words. Before providing the final text, explain your reasoning step-by-step. If the revised text does not include at least two concrete examples, add an additional sentence with a real-world statistic." ___ Why This Works People never give enough context. And once ChatGPT answers, they never correct it enough. Think about it like an intern. Deep prompting is all about precision: give clear instructions, context & the right corrections. PS: Don't forget to use the new o3-mini model. It's crushing any other one. Yes – even DeepSeek.
-
I recently went through the Prompt Engineering guide by Lee Boonstra from Google, and it offers valuable, practical insights. It confirms that getting the best results from LLMs is an iterative engineering process, not just casual conversation. Here are some key takeaways I found particularly impactful: 1. 𝐈𝐭'𝐬 𝐌𝐨𝐫𝐞 𝐓𝐡𝐚𝐧 𝐉𝐮𝐬𝐭 𝐖𝐨𝐫𝐝𝐬: Effective prompting goes beyond the text input. Configuring model parameters like Temperature (for creativity vs. determinism), Top-K/Top-P (for sampling control), and Output Length is crucial for tailoring the response to your specific needs. 2. 𝐆𝐮𝐢𝐝𝐚𝐧𝐜𝐞 𝐓𝐡𝐫𝐨𝐮𝐠𝐡 𝐄𝐱𝐚𝐦𝐩𝐥𝐞𝐬: Zero-shot, One-shot, and Few-shot prompting aren't just academic terms. Providing clear examples within your prompt is one of the most powerful ways to guide the LLM on desired output format, style, and structure, especially for tasks like classification or structured data generation (e.g., JSON). 3. 𝐔𝐧𝐥𝐨𝐜𝐤𝐢𝐧𝐠 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠: Techniques like Chain of Thought (CoT) prompting – asking the model to 'think step-by-step' – significantly improve performance on complex tasks requiring reasoning (logic, math). Similarly, Step-back prompting (considering general principles first) enhances robustness. 4. 𝐂𝐨𝐧𝐭𝐞𝐱𝐭 𝐚𝐧𝐝 𝐑𝐨𝐥𝐞𝐬 𝐌𝐚𝐭𝐭𝐞𝐫: Explicitly defining the System's overall purpose, providing relevant Context, or assigning a specific Role (e.g., "Act as a senior software architect reviewing this code") dramatically shapes the relevance and tone of the output. 5. 𝐏𝐨𝐰𝐞𝐫𝐟𝐮𝐥 𝐟𝐨𝐫 𝐂𝐨𝐝𝐞: The guide highlights practical applications for developers, including generating code snippets, explaining complex codebases, translating between languages, and even debugging/reviewing code – potential productivity boosters. 6. 𝐁𝐞𝐬𝐭 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐞𝐬 𝐚𝐫𝐞 𝐊𝐞𝐲: Specificity: Clearly define the desired output. Ambiguity leads to generic results. Instructions > Constraints: Focus on telling the model what to do rather than just what not to do. Iteration & Documentation: This is critical. Documenting prompt versions, configurations, and outcomes (using a structured template, like the one suggested) is essential for learning, debugging, and reproducing results. Understanding these techniques allows us to move beyond basic interactions and truly leverage the power of LLMs. What are your go-to prompt engineering techniques or best practices? Let's discuss! #PromptEngineering #AI #LLM
-
3 prompting techniques for reasoning in LLMs: (explained with usage and tradeoffs) A slight tweak in the prompt often makes the difference between a confused answer and a perfectly reasoned one. That’s why reasoning-based prompting techniques help. Here are three helpful ways to improve performance on tasks involving logic, math, planning, and multi-step QA: 1️⃣ Chain of Thought (CoT) The simplest and most widely used technique. Instead of asking the LLM to jump straight to the answer, we nudge it to reason step by step. This often improves accuracy because the model can walk through its logic before committing to a final output. For instance: ``` Q: I moved 10m North, then 30m East, then 50m West, and finally, 60m North, how far am I from the initial location? Let's think step by step: ``` This tiny nudge (the second line of the prompt) can unlock reasoning capabilities that standard zero-shot prompting could miss, especially on complex prompts. 2️⃣ Self-Consistency CoT is useful but not always consistent. If you prompt the same question multiple times, you might get different answers depending on the temperature setting (we covered temperature in LLMs here). Self-consistency embraces this variation. You ask the LLM to generate multiple reasoning paths and then select the most common final answer. It’s a simple idea: when in doubt, ask the model several times and trust the majority. This technique often leads to more robust results, especially on ambiguous or complex tasks. However, it doesn’t evaluate how the reasoning was done—just whether the final answer is consistent across paths. 3️⃣ Tree of Thoughts (ToT) While Self-Consistency varies the final answer, Tree of Thoughts varies the steps of reasoning at each point and then picks the best path overall. At every reasoning step, the model explores multiple possible directions. These branches form a tree, and a separate process evaluates which path seems the most promising at a particular timestamp. Think of it like a search algorithm over reasoning paths, where we try to find the most logical and coherent trail to the solution. It’s more compute-intensive, but in most cases, it significantly outperforms basic CoT. As a takeaway: Prompt engineering in LLMs = feature engineering in classical ML. ____ Find me → Avi Chawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
-
Most people are using GenAI wrong. They ask one-shot questions and expect magic. If you want real results, that are more relevant, thoughtful, and useful, then you need to prompt better. Here are two advanced prompting patterns that dramatically improve output from any major GenAI chatbot (ChatGPT, Claude, Gemini, Copilot, etc.). These patterns work across them all. CHAIN-OF-THOUGHT PATTERN - Get the model to “think out loud” by breaking down its reasoning into clear, logical steps before giving an answer. Use cases: math, logic, pricing, diagnostics, and planning. Steps: * Use cues like “Let’s work this out step by step." * Optionally include an example (few-shot) or let it figure it out (zero-shot). ✔️ Pros: Improves accuracy and transparency. ❌ Cons: Slower, and if the first step is wrong, the rest often is. TREE-OF-THOUGHT PATTERN - Structure your prompt so the model explores multiple paths or ideas, then compares and converges on the best option. Use cases: root cause analysis, strategic decisions, and product ideas. Steps: * Ask it to explore different possibilities. * Have it compare them. * Ask for a final recommendation. ✔️ Pros: Encourages critical thinking and creativity. ❌ Cons: Verbose, computationally heavy, may overthink. Most people stop at the first answer. These techniques push the model to do more: to reason, refine, and iterate. Prompt smarter. Get better results. #PromptEngineering #GenerativeAI #ChatGPT #AIProductivity #WorkSmarter #AdvancedPrompts #AIChatbots #LLMs #AIForWork
-
Take small steps when using a LLM. One common mistake is when people give a LLM a single prompt for what is actually a multi step process. A great example is people who ask ChatGPT or Claude to write them a blog on a certain topic. The LLM will do it but not the greatest job. Instead break it down into steps. First ask it to give you a list of catchy titles based on a topic. Then choose one of those titles. After that ask it to write an outline for the blog. Next ask it to research for the outline of the blog. Then ask it to write a draft of the blog based on the topic and research. And lastly ask it to edit the blog for grammar, spelling and clarity. In each step you also want to ask it to play the role of what type of person would perform each task, ie a writer, an editor, a researcher, etc. You will find you end up with a much better product. Happy Sunday all.
-
Want to prompt like the top AI startups? 👇 YC shared tips how the top AI startups in their portfolio are prompting LLMs: Key learnings: 1/ Be Hyper-Specific & Detailed (The “Manager” Style) Treat your LLM like a new employee. Provide long, detailed prompts that define their role, task, constraints, and desired output. Example: Parahelp uses a 6+ page prompt for their AI customer support agent! 2/ Assign a Clear Role (Set a Persona) Start with: “You are a [role].” This sets the context, tone, and expected expertise. This helps the LLM adopt the desired style and reasoning for the tasks. 3/ Outline the Task + Provide the Steps Clearly state the LLM's primary task Break down complex tasks into a step-by-step plan. This Improves reliability and makes complex operations more manageable for the LLM. 4/ Structure Your Prompt (and Output) Use Markdown, bullet points, XML tags to structure your instructions Clear format helps with consistent and reliable outputs. Example: Parahelp, for instance, uses tags like <manager_verify> to enforce response format. 5/ Meta-Prompting (LLM, Improve Thyself) Yes, you can ask the LLM to help you write or refine prompts. Give it your current prompt. Ask it to make your prompt better or critique it. LLMs often suggest effective improvements you might not think of. 6/ Provide Examples For complex tasks, include a few high-quality examples of input-output pairs directly in the prompt. This improves the LLM's ability to understand and replicate desired behavior. Example: Jazzberry (AI bug finder) feeds hard examples to guide the LLM. 7/ Prompt Folding & Dynamic Generation Design prompts that generate specialized sub-prompts on the fly. Use this in multi-step workflows to break down complexity and adapt based on prior output. Example: A tool classification prompt that outputs a more targeted follow-up prompt like: “Now write a bug triage report for a frontend UI error.” 8/ Add an Escape Hatch Build fail-safes right into your prompt: Example: If you’re unsure or missing info, say ‘I don’t know’ and ask for clarification. This reduces hallucinations. Increases trust. 9/ Use Debug Info & Thinking Traces Ask the model to explain its reasoning “Include a section called ‘debug_info’ where you explain the logic behind your answer.” This is great for debugging and fine-tuning. 10/ Treat Evals Like Gold Yes, prompts matter. But evals are your most imp IP. Evals are essential for knowing why a prompt works and for iterating effectively. 11/ Consider Model "Personalities" & Distillation Different LLMs have different "personalities" Use the most powerful model to write and refine prompts. Then distill the optimized prompt for speed/post for production use. Know someone building AI agents? Share this with them! Let’s level up our prompt engineering together 🔥 🔗 Source in comments #Startups #ArtificialIntelligence #PromptEngineering #AgenticAI #EnterpriseAI
-
When it comes to building truly reliable AI agents, I’ve realized that prompting isn’t just about giving instructions, it’s about crafting intentional conversations that guide the model with clarity, structure, and context. These prompt engineering techniques have shaped the way we should think about deploying LLM-powered systems in the real world. The goal isn’t just output, it’s precision, traceability, and contextual awareness baked into every generation It starts with being hyper-specific and detailed—think of your LLM like a new team member. The clearer you are about their task, constraints, and tone, the better they perform. Pair that with persona prompting to set the right expectations, and suddenly your LLM behaves more like a domain expert than a chatbot. From there, you outline the task and give it a plan, making even the most complex workflows feel digestible for the model. Structuring the prompt with bullet points, Markdown, or even XML-like tags makes the output predictable and parseable, especially when dealing with automation pipelines. I often add few-shot examples directly in the prompt to guide the model with real-world context. These examples anchor behavior and dramatically reduce misunderstanding. Things really start to scale with prompt folding and dynamic generation. In multi-stage flows, I let earlier outputs shape the next prompt. It’s how you make agents more adaptive. Still, I always include an escape hatch—asking the LLM to admit when it doesn't know something. It’s a small tweak that prevents hallucinations and builds trust. For deeper insight, I include debug info or thinking traces. Asking the LLM to explain its logic is like reading the footnotes of its thought process—great for debugging and refinement. But the real crown jewel? Your eval suite. Prompting without evaluation is like flying blind. Having test cases lets you track improvements, regressions, and stability across iterations. Finally, LLM personalities and distillation matter more than people think. Some models need more hand-holding; others just “get it.” I often use a bigger model to refine prompts and then distill them down for faster, cheaper inference with smaller models. Building reliable AI agents, don’t overlook the prompt. Get intentional, get structured.