AI Prompt Improvement

Explore top LinkedIn content from expert professionals.

  • View profile for Andrew Ng
    Andrew Ng Andrew Ng is an Influencer

    DeepLearning.AI, AI Fund and AI Aspire

    2,440,850 followers

    I’ve noticed that many GenAI application projects put in automated evaluations (evals) of the system’s output probably later — and rely on humans to manually examine and judge outputs longer — than they should. This is because building evals is viewed as a massive investment (say, creating 100 or 1,000 examples, and designing and validating metrics) and there’s never a convenient moment to put in that up-front cost. Instead, I encourage teams to think of building evals as an iterative process. It’s okay to start with a quick-and-dirty implementation (say, 5 examples with unoptimized metrics) and then iterate and improve over time. This allows you to gradually shift the burden of evaluations away from humans and toward automated evals. I wrote previously in The Batch about the importance and difficulty of creating evals. Say you’re building a customer-service chatbot that responds to users in free text. There’s no single right answer, so many teams end up having humans pore over dozens of example outputs with every update to judge if it improved the system. While techniques like LLM-as-judge are helpful, the details of getting this to work well (such as what prompt to use, what context to give the judge, and so on) are finicky to get right. All this contributes to the impression that building evals requires a large up-front investment, and thus on any given day, a team can make more progress by relying on human judges than figuring out how to build automated evals. I encourage you to approach building evals differently. It’s okay to build quick evals that are only partial, incomplete, and noisy measures of the system’s performance, and to iteratively improve them. They can be a complement to, rather than replacement for, manual evaluations. Over time, you can gradually tune the evaluation methodology to close the gap between the evals’ output and human judgments. For example: - It’s okay to start with very few examples in the eval set, say 5, and gradually add to them over time — or subtract them if you find that some examples are too easy or too hard, and not useful for distinguishing between the performance of different versions of your system. - It’s okay to start with evals that measure only a subset of the dimensions of performance you care about, or measure narrow cues that you believe are correlated with, but don’t fully capture, system performance. For example if, at a certain moment in the conversation, your customer-support agent is supposed to (i) call an API to issue a refund and (ii) generate an appropriate message to the user, you might start off measuring only whether or not it calls the API correctly and not worry about the message. Or if, at a certain moment, your chatbot should recommend a specific product, a basic eval could measure whether or not the chatbot mentions that product without worrying about what it says about it. [Truncated due to length limit. Full text: https://lnkd.in/gygj3y7w ]

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    621,610 followers

    If you’re an AI engineer trying to understand and build with GenAI, RAG (Retrieval-Augmented Generation) is one of the most essential components to master. It’s the backbone of any LLM system that needs fresh, accurate, and context-aware outputs. Let’s break down how RAG works, step by step, from an engineering lens, not a hype one: 🧠 How RAG Works (Under the Hood) 1. Embed your knowledge base → Start with unstructured sources - docs, PDFs, internal wikis, etc. → Convert them into semantic vector representations using embedding models (e.g., OpenAI, Cohere, or HuggingFace models) → Output: N-dimensional vectors that preserve meaning across contexts 2. Store in a vector database → Use a vector store like Pinecone, Weaviate, or FAISS → Index embeddings to enable fast similarity search (cosine, dot-product, etc.) 3. Query comes in - embed that too → The user prompt is embedded using the same embedding model → Perform a top-k nearest neighbor search to fetch the most relevant document chunks 4. Context injection → Combine retrieved chunks with the user query → Format this into a structured prompt for the generation model (e.g., Mistral, Claude, Llama) 5. Generate the final output → LLM uses both the query and retrieved context to generate a grounded, context-rich response → Minimizes hallucinations and improves factuality at inference time 📚 What changes with RAG? Without RAG: 🧠 “I don’t have data on that.” With RAG: 🤖 “Based on [retrieved source], here’s what’s currently known…” Same model, drastically improved quality. 🔍 Why this matters You need RAG when: → Your data changes daily (support tickets, news, policies) → You can’t afford hallucinations (legal, finance, compliance) → You want your LLMs to access your private knowledge base without retraining It’s the most flexible, production-grade approach to bridge static models with dynamic information. 🛠️ Arvind and I are kicking off a hands-on workshop on RAG This first session is designed for beginner to intermediate practitioners who want to move beyond theory and actually build. Here’s what you’ll learn: → How RAG enhances LLMs with real-time, contextual data → Core concepts: vector DBs, indexing, reranking, fusion → Build a working RAG pipeline using LangChain + Pinecone → Explore no-code/low-code setups and real-world use cases If you're serious about building with LLMs, this is where you start. 📅 Save your seat and join us live: https://lnkd.in/gS_B7_7d

  • View profile for Ruben Hassid

    Master AI before it masters you.

    804,591 followers

    STOP asking ChatGPT to "make it better". Here's how to better prompt it instead: ☑ Clearly Identify the Issue Rather than a vague “make it better,” specify the exact element that needs change. For example: "Rewrite the second paragraph so it includes three concrete examples of our product’s benefits. The tone must be formal and persuasive. Remove any informal language or redundant phrases." ☑ Divide the Task into Discrete Steps Break the overall revision into a sequence of manageable tasks. For example: "Go through my instructions, step by step. – Step 1: Summarize it in one sentence. – Step 2: Identify two specific weaknesses. – Step 3: Rewrite the text to address these weaknesses, incorporating specific data or examples." ☑ Specify the Format and Level of Detail Define exactly how the final output should look. For example: "Provide the final revised text as a numbered list where each item contains 2–3 sentences. Each item must include at least one statistical fact or concrete example, and the overall response should not exceed 250 words." ☑ Request a Chain-of-Thought Explanation Ask the model to detail its reasoning process before giving the final output. For example: "Before providing the final revised text, explain your reasoning step-by-step. Identify which parts need improvement and how your changes will enhance clarity and professionalism. Then, present the final revised version." ☑ Conditional Instructions to Enforce Compliance Add if/then conditions to ensure all requirements are met. For example: "If the revised text does not include at least two concrete examples, then add a sentence with a real-world statistic. Otherwise, finalize the response as is." ☑ Consolidate All Instructions into One Prompt Integrate all the detailed instructions into a single, comprehensive prompt. For example: "First, identify the section of the text that needs improvement and explain why it is lacking. Next, summarize the current text in one sentence and list two specific weaknesses. Then, rewrite the text to address these weaknesses, ensuring the revised version includes three concrete examples, uses a formal and persuasive tone, and is structured as a numbered list with each item containing 2–3 sentences. Each list item must include at least one statistical fact or example, and the overall response must be no longer than 250 words. Before providing the final text, explain your reasoning step-by-step. If the revised text does not include at least two concrete examples, add an additional sentence with a real-world statistic." ___ Why This Works People never give enough context. And once ChatGPT answers, they never correct it enough. Think about it like an intern. Deep prompting is all about precision: give clear instructions, context & the right corrections. PS: Don't forget to use the new o3-mini model. It's crushing any other one. Yes – even DeepSeek.

  • View profile for Usman Sheikh

    I co-found companies with experts ready to own outcomes, not give advice.

    56,083 followers

    Prompt engineering is the new consulting superpower. Most haven't realized it yet. Over the last couple of days, I reviewed the latest guides by Google, Anthropic and OpenAI. Some of the key recommendations to improve output: → Being very specific about expertise levels requested → Using structured instructions or meta prompts → Explicitly referencing project documents in the prompt → Asking the model to "think step by step" Based on the guides, here are four ways to immediately level up your prompting skill set as a consultant: 1. Define the expert persona precisely "You're a specialist with 15 years in retail supply chain optimization who has worked with Target and Walmart." Why it matters: The model draws from deeper technical patterns, not just general concepts. 2. Structure the deliverable explicitly "Provide 3 key insights, their implications and then support each with data-driven evidence." Why it matters: This gives me structured material that needs minimal editing. 3. Set distinctive success parameters "Focus on operational inefficiencies that competitors typically overlook." Why it matters: You push the model beyond obvious answers to genuine competitive insights. 4. Establish the decision context "This is for a CEO with a risk-averse investor applying pressure to improve their gross margins." Why it matters: The recommendations align with stakeholder realities and urgency. The above were the main takeaways I took from the guides which I found helpful. When you run these prompts versus generic statements, you will see a massive difference in quality and relevance. Bonus tips which are working for me: → Create prompt templates using the four elements → Test different expert personas against the same problem (I regularly use "Senior McKinsey partner" to counter my position detecting gaps in my thinking.) → Ask the model to identify contradictions or gaps in the data before finalizing any recommendations. We’re only scratching the surface of what these “intelligence partners” can offer. Getting better at prompting may be one of the most asymmetric skill opportunities all of us have today. Share your favourite prompting tip below! P.S Was this post helpful? Should I share one post per week on how I’m improving my AI-related skills?

  • View profile for Allie K. Miller
    Allie K. Miller Allie K. Miller is an Influencer

    #1 Most Followed Voice in AI Business (2M) | Former Amazon, IBM | Fortune 500 AI and Startup Advisor, Public Speaker | @alliekmiller on Instagram, X, TikTok | AI-First Course with 300K+ students - Link in Bio

    1,633,796 followers

    Are you making the most of AI, or just skimming the surface? Stop stopping at the pre-step. Let me explain. My teammate and I traded voice memos on a complex operating procedure so we didn’t have to wait until both of us were free for an hour. In a non-AI world, I would: - listen to it - take notes - listen to it again - finish notes - summarize - put into a format for my cofounders to review - get on a call to discuss - decide on next steps - assign action items - send summary out - schedule meetings to track progress In an AI world, I now: - listen to the voice memo at 2x while reading the AI transcription (I like to capture emphasis/tone) - send transcript to ChatGPT to summarize - ask ChatGPT for new format (table); review - ask ChatGPT for next steps, 5x more detailed; review - ask ChatGPT for additional legal/financial/product/user considerations; review and answer - ask ChatGPT for a meeting agenda to review all of this; review - hold the meeting; record and transcribe review meeting - summarize transcript with AI, review, send out recap and any followup meeting makers Most people will stop at asking ChatGPT to summarize the voice memos. The top AI users will think about how it can improve their entire workflow, even with its imperfections, and move away from doing everything from scratch to INSTEAD being a creative process manager, critical thinker, and reviewer. Challenge yourself to augment more of your process, not just step 1.

  • View profile for Sourav Verma

    Principal Applied Scientist at Oracle | AI | Agents | NLP | ML/DL | Engineering

    18,150 followers

    The interview is for a GenAI Engineer role at Anthropic. Interviewer: "Your prompt gives perfect answers during testing - but fails randomly in production. What’s wrong?" You: “Ah, the prompt drift problem. Identical prompts can yield different outputs due to sampling (temperature/top-p) or shift entirely under paraphrased inputs." Interviewer: "Meaning?" You: "LLMs don't understand instructions - they predict them. A single rephrased sentence, longer context, or slight temperature change can push the model into a different completion path. What looks deterministic in a 10-example test collapses under real-world input diversity." Interviewer: "So how do you fix it?" You: Treat prompts like production code: 1. Prompt templates - lock phrasing with {{placeholders}} for user input. 2. Lock sampling - fix temperature=0, top_p=1 for reproducibility. 3. System-level guardrails - e.g., "Always respond in valid JSON matching this schema: {{schema}}" 4. Fuzz-test inputs - run 1k+ paraphrased variants pre-deploy. 5. Delimiters + structure -> Prevents bleed and enforces parsing: """USER_INPUT: {{input}}""" """OUTPUT_FORMAT: {{schema}}""" Interviewer: "So prompt reliability is more about engineering than creativity?" You: "Exactly. Creative prompting gets you demos. Structured prompting gets you products." Interviewer: "What’s your golden rule for prompt design?" You: “Prompts are code. They need versioning, testing, and regression tracking - not vibes. If you can’t reproduce the output, you can’t trust it." Interviewer: "So prompt drift is basically a reliability bug?" You: "Yes - and fixing it turns GenAI from a prototype into a platform." #PromptEngineering #GenerativeAI

  • View profile for Tom Chavez
    Tom Chavez Tom Chavez is an Influencer

    Co-Founder, super{set}

    18,548 followers

    I used to think “prompt engineering” was LinkedIn cosplay. A made-up job title for people riding the AI gold rush with nothing but a pickaxe and a Canva resume. I said—confidently and repeatedly—that prompt engineering wasn’t a real profession. That LLMs would soon be smart enough to understand what you meant, not what you typed. That the whole thing was a short-lived hustle. I was wrong. What I dismissed as a gimmick has turned out to be a craft. Prompting matters. More than I expected. Sometimes more than fine-tuning. Sometimes more than model choice. Because here’s the truth: 💡 Prompting is differentiation. A well-designed prompt can yield 10x better results. It’s not a party trick—it’s strategic scaffolding. 💡General-purpose models can outperform fine-tuned ones—if prompted right. Smart prompting + inventive engineering unlocks more than I gave it credit for. 💡Fine-tuning is expensive. Prompting is scrappy. It gives you leverage without the MLOps overhead. 💡Context matters. Strategic prompts that include examples, constraints, clear objectives, and instructions lead to results that are 100X more effective than terse prompts that fail to paint the target. A philosophy teacher of mine, when critiqued and confronted by a position he once held, would say with a twinkle in his eye, “No, you’re mistaken, my former self was of that view.” So, copping his line here, my former self was dead wrong. My current self understands the value still to be extracted from intelligent prompting in AI.

  • View profile for Brij kishore Pandey
    Brij kishore Pandey Brij kishore Pandey is an Influencer

    AI Architect & Engineer | AI Strategist

    715,797 followers

    RAG stands for Retrieval-Augmented Generation. It’s a technique that combines the power of LLMs with real-time access to external information sources. Instead of relying solely on what an AI model learned during training (which can quickly become outdated), RAG enables the model to retrieve relevant data from external databases, documents, or APIs—and then use that information to generate more accurate, context-aware responses. How does RAG work? 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗲: The system searches for the most relevant documents or data based on your query, using advanced search methods like semantic or vector search. 𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻: Instead of just using the original question, RAG 𝗮𝘂𝗴𝗺𝗲𝗻𝘁𝘀 (enriches) the prompt by adding the retrieved information directly into the input for the AI model. This means the model doesn’t just rely on what it “remembers” from training—it now sees your question 𝘱𝘭𝘶𝘴 the latest, domain-specific context 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗲: The LLM takes the retrieved information and crafts a well-informed, natural language response. 𝗪𝗵𝘆 𝗱𝗼𝗲𝘀 𝗥𝗔𝗚 𝗺𝗮𝘁𝘁𝗲𝗿? Improves accuracy: By referencing up-to-date or proprietary data, RAG reduces outdated or incorrect answers. Context-aware: Responses are tailored using the latest information, not just what the model “remembers.” Reduces hallucinations: RAG helps prevent AI from making up facts by grounding answers in real sources. Example: Imagine asking an AI assistant, “What are the latest trends in renewable energy?” A traditional LLM might give you a general answer based on old data. With RAG, the model first searches for the most recent articles and reports, then synthesizes a response grounded in that up-to-date information. Illustration by Deepak Bhardwaj

  • View profile for Aurimas Griciūnas
    Aurimas Griciūnas Aurimas Griciūnas is an Influencer

    Founder @ SwirlAI • Ex-CPO @ neptune.ai (Acquired by OpenAI) • UpSkilling the Next Generation of AI Talent • Author of SwirlAI Newsletter • Public Speaker

    181,987 followers

    I have been developing Agentic Systems for the past few years and the same patterns keep emerging. 👇 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗗𝗿𝗶𝘃𝗲𝗻 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 is the most reliable way to be successful in building your 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 - here is my template. Let’s zoom in: 𝟭. Define a problem you want to solve: is GenAI even needed? 𝟮. Build a Prototype: figure out if the solution is feasible. 𝟯. Define Performance Metrics: you must have output metrics defined for how you will measure success of your application. 𝟰. Define Evals: split the above into smaller input metrics that can move the key metrics forward. Decompose them into tasks that could be automated and move the given input metrics. Define Evals for each. Store the Evals in your Observability Platform. ℹ️ Steps 𝟭. - 𝟰. are where AI Product Managers can help, but can also be handled by AI Engineers. 𝟱. Build a PoC: it can be simple (excel sheet) or more complex (user facing UI). Regardless of what it is, expose it to the users for feedback as soon as possible. 𝟲. Instrument your application: gather traces and human feedback and store it in an Observability Platform next to previously stored Evals. 𝟳. Run Evals on traced data: traces contain inputs and outputs of your application, run evals on top of them. 𝟴. Analyse Failing Evals and negative user feedback: this data is gold as it specifically pinpoints where the Agentic System needs improvement. 𝟵. Use data from the previous step to improve your application - prompt engineer, improve AI system topology, finetune models etc. Make sure that the changes move Evals into the right direction. 𝟭𝟬. Build and expose the improved application to the users. 𝟭𝟭. Monitor the application in production: this comes out of the box - you have implemented evaluations and traces for development purposes, they can be reused for monitoring. Configure specific alerting thresholds and enjoy the peace of mind. ✅ 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: ➡️ Run steps 𝟲. - 𝟭𝟬. to continuously improve and evolve your application. ➡️ As you build up in complexity, new requirements can be added to the same application, this includes running steps 𝟭. - 𝟱. and attaching the new logic as routes to your Agentic System. ➡️ You start off with a simple Chatbot and add a route that can classify user intent to take action (e.g. add items to a shopping cart). What is your experience in evolving Agentic Systems? Let me know in the comments 👇

  • View profile for Laura Jeffords Greenberg

    General Counsel at Worksome | Building AI-Native Legal Functions | Board Member & Speaker

    18,007 followers

    Most people don’t realize: AI can coach you on how to prompt it better. Here’s how to turn AI into your personal prompt coach, so you get better results and learn how to use AI faster. Try this two-step fix: 1. State your goal and context. 2. Ask one of these questions: ➡️ "How would you rewrite my prompt to get more [specific, creative, detailed, etc.] responses?" ➡️ "If you were trying to get [desired outcome], how would you modify this prompt?" ➡️ "If this were your prompt, what would you change to make it more effective?" ➡️ "What elements are missing from my prompt that would help you generate better responses?" ➡️ "How might you enhance this prompt to avoid common pitfalls or misinterpretations?" ➡️ Or simply: "Improve my prompt." Before: "Explain force majeure clauses." After: "Analyze how courts in California have interpreted force majeure clauses in commercial leases since COVID-19, focusing on what constitutes 'unforeseeable circumstances' and the burden of proof required to invoke these provisions." The difference? A broad, non-jx specific, superficial overview vs. actionable legal insights for commercial leases in California. Not only will you get better outcomes, but you will learn how to improve your prompting in the process. What are your go-to strategies or favorite prompts to optimize AI responses?

Explore categories