I’ve had the chance to work across several #EnterpriseAI initiatives esp. those with human computer interfaces. Common failures can be attributed broadly to bad design/experience, disjointed workflows, not getting to quality answers quickly, and slow response time. All exacerbated by high compute costs because of an under-engineered backend. Here are 10 principles that I’ve come to appreciate in designing #AI applications. What are your core principles? 1. DON’T UNDERESTIMATE THE VALUE OF GOOD #UX AND INTUITIVE WORKFLOWS Design AI to fit how people already work. Don’t make users learn new patterns — embed AI in current business processes and gradually evolve the patterns as the workforce matures. This also builds institutional trust and lowers resistance to adoption. 2. START WITH EMBEDDING AI FEATURES IN EXISTING SYSTEMS/TOOLS Integrate directly into existing operational systems (CRM, EMR, ERP, etc.) and applications. This minimizes friction, speeds up time-to-value, and reduces training overhead. Avoid standalone apps that add context-switching or friction. Using AI should feel seamless and habit-forming. For example, surface AI-suggested next steps directly in Salesforce or Epic. Where possible push AI results into existing collaboration tools like Teams. 3. CONVERGE TO ACCEPTABLE RESPONSES FAST Most users have gotten used to publicly available AI like #ChatGPT where they can get to an acceptable answer quickly. Enterprise users expect parity or better — anything slower feels broken. Obsess over model quality, fine-tune system prompts for the specific use case, function, and organization. 4. THINK ENTIRE WORK INSTEAD OF USE CASES Don’t solve just a task - solve the entire function. For example, instead of resume screening, redesign the full talent acquisition journey with AI. 5. ENRICH CONTEXT AND DATA Use external signals in addition to enterprise data to create better context for the response. For example: append LinkedIn information for a candidate when presenting insights to the recruiter. 6. CREATE SECURITY CONFIDENCE Design for enterprise-grade data governance and security from the start. This means avoiding rogue AI applications and collaborating with IT. For example, offer centrally governed access to #LLMs through approved enterprise tools instead of letting teams go rogue with public endpoints. 7. IGNORE COSTS AT YOUR OWN PERIL Design for compute costs esp. if app has to scale. Start small but defend for future-cost. 8. INCLUDE EVALS Define what “good” looks like and run evals continuously so you can compare against different models and course-correct quickly. 9. DEFINE AND TRACK SUCCESS METRICS RIGOROUSLY Set and measure quantifiable indicators: hours saved, people not hired, process cycles reduced, adoption levels. 10. MARKET INTERNALLY Keep promoting the success and adoption of the application internally. Sometimes driving enterprise adoption requires FOMO. #DigitalTransformation #GenerativeAI #AIatScale #AIUX
How to Improve AI Functionality
Explore top LinkedIn content from expert professionals.
Summary
Improving AI functionality means making artificial intelligence systems smarter, faster, and more reliable so they can solve real-world problems and work smoothly alongside people and other tools. By focusing on how AI understands information, interacts with users, and learns from feedback, organizations can build AI that delivers better results and adapts to changing needs.
- Integrate seamlessly: Embed AI features in existing workflows and systems so people can use them without switching between multiple applications.
- Curate smart context: Organize and structure all the information AI receives, such as user instructions, past interactions, and tool descriptions, to help it make more accurate decisions.
- Monitor and refine: Regularly track how your AI system performs, gather user feedback, and use this data to make continuous improvements.
-
-
You've written the perfect prompt, but your AI agent is still failing. Why? Because the your prompt is just 5% of the input. The other 95% is the historical data, the tool definitions, the system rules is an absolute mess. We've graduated from tactical prompts. The biggest lever for AI performance today is Context Engineering and that does not mean "better RAG." It's the discipline of architecting the entire operating system for an LLM. It’s the dynamic, curated "working memory" we build for the model before it ever sees the prompt. Lazy context is why your agent loops, hallucinates, and ignores instructions. Here are 4 techniques we use internally to get our agents to perform well. 1. Your System Prompt is your AI's Constitution. Just because a prompt guru on X told you that "You are a helpful assistant. Be a lawyer" is great system prompt...its not. Define the AI's persona, rules, boundaries, and (most importantly) what it must not do. This instruction set is the most critical, persistent part of the context. Put it first. 2. Curate Memory. Sending raw unfiltered historical data won't work either. The model will suffer from the "Lost in the Middle" problem and forget key facts. Instead, engineer a memory layer: use a summarizer to create a concise "rolling summary" or a "fact sheet" of the conversation, and feed that in as context. 3. Tool Definitions Are Context. When you give an AI agents/tools via function calling, detailed tool descriptions are a critical instruction. A vague function name (e.g., "search_db") will fail. So use a precise description ("Use this function only to find a customer's order ID based on their email") is high-leverage context that controls behavior. 4. Separate and Structure All Inputs. The model needs to know what's is an "instruction," what's "interaction history," what's "retrieved data," and what's the "user query." Stop concatenating them into one messy blob. Use XML tags (<instructions>, <history_summary>, <retrieved_doc>) to create a structured information packet. If you're thinking the next 10x leap in AI will come from a 10T parameter model, it wont. It will come from organizations that master the data pipeline into the model along with architecting the entire context.
-
In the world of Generative AI, 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹-𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 (𝗥𝗔𝗚) is a game-changer. By combining the capabilities of LLMs with domain-specific knowledge retrieval, RAG enables smarter, more relevant AI-driven solutions. But to truly leverage its potential, we must follow some essential 𝗯𝗲𝘀𝘁 𝗽𝗿𝗮𝗰𝘁𝗶𝗰𝗲𝘀: 1️⃣ 𝗦𝘁𝗮𝗿𝘁 𝘄𝗶𝘁𝗵 𝗮 𝗖𝗹𝗲𝗮𝗿 𝗨𝘀𝗲 𝗖𝗮𝘀𝗲 Define your problem statement. Whether it’s building intelligent chatbots, document summarization, or customer support systems, clarity on the goal ensures efficient implementation. 2️⃣ 𝗖𝗵𝗼𝗼𝘀𝗲 𝘁𝗵𝗲 𝗥𝗶𝗴𝗵𝘁 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗕𝗮𝘀𝗲 - Ensure your knowledge base is 𝗵𝗶𝗴𝗵-𝗾𝘂𝗮𝗹𝗶𝘁𝘆, 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱, 𝗮𝗻𝗱 𝘂𝗽-𝘁𝗼-𝗱𝗮𝘁𝗲. - Use vector embeddings (e.g., pgvector in PostgreSQL) to represent your data for efficient similarity search. 3️⃣ 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗲 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗠𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺𝘀 - Use hybrid search techniques (semantic + keyword search) for better precision. - Tools like 𝗽𝗴𝗔𝗜, 𝗪𝗲𝗮𝘃𝗶𝗮𝘁𝗲, or 𝗣𝗶𝗻𝗲𝗰𝗼𝗻𝗲 can enhance retrieval speed and accuracy. 4️⃣ 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗲 𝗬𝗼𝘂𝗿 𝗟𝗟𝗠 (𝗢𝗽𝘁𝗶𝗼𝗻𝗮𝗹) - If your use case demands it, fine-tune the LLM on your domain-specific data for improved contextual understanding. 5️⃣ 𝗘𝗻𝘀𝘂𝗿𝗲 𝗦𝗰𝗮𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆 - Architect your solution to scale. Use caching, indexing, and distributed architectures to handle growing data and user demands. 6️⃣ 𝗠𝗼𝗻𝗶𝘁𝗼𝗿 𝗮𝗻𝗱 𝗜𝘁𝗲𝗿𝗮𝘁𝗲 - Continuously monitor performance using metrics like retrieval accuracy, response time, and user satisfaction. - Incorporate feedback loops to refine your knowledge base and model performance. 7️⃣ 𝗦𝘁𝗮𝘆 𝗦𝗲𝗰𝘂𝗿𝗲 𝗮𝗻𝗱 𝗖𝗼𝗺𝗽𝗹𝗶𝗮𝗻𝘁 - Handle sensitive data responsibly with encryption and access controls. - Ensure compliance with industry standards (e.g., GDPR, HIPAA). With the right practices, you can unlock its full potential to build powerful, domain-specific AI applications. What are your top tips or challenges?
-
I have been developing Agentic Systems for the past few years and the same patterns keep emerging. 👇 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗗𝗿𝗶𝘃𝗲𝗻 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 is the most reliable way to be successful in building your 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 - here is my template. Let’s zoom in: 𝟭. Define a problem you want to solve: is GenAI even needed? 𝟮. Build a Prototype: figure out if the solution is feasible. 𝟯. Define Performance Metrics: you must have output metrics defined for how you will measure success of your application. 𝟰. Define Evals: split the above into smaller input metrics that can move the key metrics forward. Decompose them into tasks that could be automated and move the given input metrics. Define Evals for each. Store the Evals in your Observability Platform. ℹ️ Steps 𝟭. - 𝟰. are where AI Product Managers can help, but can also be handled by AI Engineers. 𝟱. Build a PoC: it can be simple (excel sheet) or more complex (user facing UI). Regardless of what it is, expose it to the users for feedback as soon as possible. 𝟲. Instrument your application: gather traces and human feedback and store it in an Observability Platform next to previously stored Evals. 𝟳. Run Evals on traced data: traces contain inputs and outputs of your application, run evals on top of them. 𝟴. Analyse Failing Evals and negative user feedback: this data is gold as it specifically pinpoints where the Agentic System needs improvement. 𝟵. Use data from the previous step to improve your application - prompt engineer, improve AI system topology, finetune models etc. Make sure that the changes move Evals into the right direction. 𝟭𝟬. Build and expose the improved application to the users. 𝟭𝟭. Monitor the application in production: this comes out of the box - you have implemented evaluations and traces for development purposes, they can be reused for monitoring. Configure specific alerting thresholds and enjoy the peace of mind. ✅ 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: ➡️ Run steps 𝟲. - 𝟭𝟬. to continuously improve and evolve your application. ➡️ As you build up in complexity, new requirements can be added to the same application, this includes running steps 𝟭. - 𝟱. and attaching the new logic as routes to your Agentic System. ➡️ You start off with a simple Chatbot and add a route that can classify user intent to take action (e.g. add items to a shopping cart). What is your experience in evolving Agentic Systems? Let me know in the comments 👇
-
We’re entering an era where AI isn’t just answering questions — it’s starting to take action. From booking meetings to writing reports to managing systems, AI agents are slowly becoming the digital coworkers of tomorrow!!!! But building an AI agent that’s actually helpful — and scalable — is a whole different challenge. That’s why I created this 10-step roadmap for building scalable AI agents (2025 Edition) — to break it down clearly and practically. Here’s what it covers and why it matters: - Start with the right model Don’t just pick the most powerful LLM. Choose one that fits your use case — stable responses, good reasoning, and support for tools and APIs. - Teach the agent how to think Should it act quickly or pause and plan? Should it break tasks into steps? These choices define how reliable your agent will be. - Write clear instructions Just like onboarding a new hire, agents need structured guidance. Define the format, tone, when to use tools, and what to do if something fails. - Give it memory AI models forget — fast. Add memory so your agent remembers what happened in past conversations, knows user preferences, and keeps improving. - Connect it to real tools Want your agent to actually do something? Plug it into tools like CRMs, databases, or email. Otherwise, it’s just chat. - Assign one clear job Vague tasks like “be helpful” lead to messy results. Clear tasks like “summarize user feedback and suggest improvements” lead to real impact. - Use agent teams Sometimes, one agent isn’t enough. Use multiple agents with different roles — one gathers info, another interprets it, another delivers output. - Monitor and improve Watch how your agent performs, gather feedback, and tweak as needed. This is how you go from a working demo to something production-ready. - Test and version everything Just like software, agents evolve. Track what works, test different versions, and always have a backup plan. - Deploy and scale smartly From APIs to autoscaling — once your agent works, make sure it can scale without breaking. Why this matters: The AI agent space is moving fast. Companies are using them to improve support, sales, internal workflows, and much more. If you work in tech, data, product, or operations — learning how to build and use agents is quickly becoming a must-have skill. This roadmap is a great place to start or to benchmark your current approach. What step are you on right now?
-
When your AI sounds clever but spits nonsense, it’s not hallucinating. It’s missing the right context. Most prompts throw in random or outdated info. That’s like giving a chef spoiled ingredients and expecting a gourmet meal. Context isn’t just decoration anymore. It’s the whole recipe. Today’s best AI setups don’t just dump data in. They carefully decide what to pull, when, and how to keep the context clean. Here’s what a well-engineered context looks like: • Working memory: recent interactions and tool outputs needed right now • Long-term memory: user habits, past results, and facts stored smartly for later • Dynamic retrieval: smart query tweaks, ranking, and trimming before feeding the model • Tools as partners: APIs, search engines, and servers called only when needed Example: A coding AI keeps track of recent errors and code changes in working memory. It stores project files and dependencies in long-term memory. When stuck, it fetches docs or runs web searches. The result? Faster, more reliable code without nonsense. If you’re building AI agents, focus on: • Sharpening retrieval: rewrite queries, rerank results, compress context before the model sees it • Splitting memories: keep short-term fresh and long-term lean with only key facts • Using tools like sensors: call them when info is missing, don’t expect the model to guess • Defining clear context rules: schemas for tools and outputs that the system follows Your current retrieval-augmented setup isn’t useless. It’s the base. The trick now is orchestration — giving the model just the right slice of context to do its job well. So next time your AI messes up, don’t tweak the prompt first. Fix the context. Then watch it finally get it right. I’m Shrey Shah & I share daily guides on AI. If this helped, hit the ♻️ reshare button to help someone else build smarter AI.
-
This seems to be on everyone’s mind: how to operationalize your product team around AI. Peter Yang and I recently chatted about this topic and here’s what I shared about how we are doing this at Duolingo. For improving our product: -Using AI to solve problems that weren’t solvable before. One of the problems we had been trying to solve for years was conversation practice. With our Max feature, Video Call, learners can now practice conversations with our character Lily. The conversations are also personalized to each learner’s proficiency level. -Prototyping with AI to speed up the product process. For example, for our Duolingo Chess, PMs vibe-coded with LLMs to quickly build a prototype. This decreased rounds of iteration, allowing our Engineers to start building the final product much sooner. -Integrating AI into our tooling to scale. This allowed us to go from 100 language courses in 12 years to nearly 150 new ones in the last 12 months. For increasing AI adoption: -Building with AI Slack channels. Created an AI Slack channel for people to show and tell and share prototypes and tips. -“AI Show and Tell” at All-Hands meetings. Added a five‑minute live demo slot in every all hands meeting for people to share updates on AI work. -FriAIdays. Protected a two‑hour block every Friday for hands-on experimentation and demos. -Function-specific AI working groups. Assembled a cross-functional group (Eng, PM, Design, etc.) to test new tools and share best practices with the rest of the org. -Company-wide AI hackathon. Scheduled a 3-day hackathon focused on using generative AI. Here are some of our favorite AI tools and how we are using them: -ChatGPT as a general assistant -Cursor or Replit for vibe coding or prototyping -Granola or Fathom for taking meeting notes -Glean for internal company search #productmanagement #duolingo
-
Anthropic just posted another banger guide. This one is on building more efficient agents to handle more tools and efficient token usage. This is a must-read for AI devs! (bookmark it) It helps with three major issues in AI agent tool calling: token costs, latency, and tool composition. How? It combines code executions with MCP, where it turns MCP servers into code APIs rather than direct tool calls. Here is all you need to know: 1. Token Efficiency Problem: Loading all MCP tool definitions upfront and passing intermediate results through the context window creates massive token overhead, sometimes 150,000+ tokens for complex multi-tool workflows. 2. Code-as-API Approach: Instead of direct tool calls, present MCP servers as code APIs (e.g., TypeScript modules) that agents can import and call programmatically, reducing the example workflow from 150k to 2k tokens (98.7% savings). 3. Progressive Tool Discovery: Use filesystem exploration or search_tools functions to load only the tool definitions needed for the current task, rather than loading everything upfront into context. This solves so many context rot and token overload problems. 4. In-Environment Data Processing: Filter, transform, and aggregate data within the code execution environment before passing results to the model. E.g., filter 10,000 spreadsheet rows down to 5 relevant ones. 5. Better Control Flow: Implement loops, conditionals, and error handling with native code constructs rather than chaining individual tool calls through the agent, reducing latency and token consumption. 6. Privacy: Sensitive data can flow through workflows without entering the model's context; only explicitly logged/returned values are visible, with optional automatic PII tokenization. 7. State Persistence: Agents can save intermediate results to files and resume work later, enabling long-running tasks and incremental progress tracking. 8. Reusable Skills: Agents can save working code as reusable functions (with SKILL .MD documentation), building a library of higher-level capabilities over time. This approach is complex and it's not perfect, but it should enhance the efficiency and accuracy of your AI agents across the board. anthropic. com/engineering/code-execution-with-mcp
-
Let's get straight to the point: if you want your AI or your entire workflow to deliver real, repeatable value, it's time to go beyond crafting clever prompts. To achieve scalable results, lasting creativity, and genuine peace of mind, you must build the scaffolding, not just issue instructions and hope for the best. Here's how to upgrade your approach: 1. Build frameworks, not just prompts. Think about the system you want, not just the answer you hope for. A reliable framework provides your agents (and your team) with a clear reasoning path to follow, time after time. 2. Apply heuristic principles. What are the rules, values, or questions that should guide every decision? Define them up front. This helps your work stay adaptable, responsible, and a whole lot smarter. 3. Design for consistency and growth. Set up your workflows so that improvement is built in. A simple feedback loop or a transparent review process turns one-off wins into repeatable habits. 4. Make ethics and transparency non-negotiable. Only trust systems you can explain. Build in ways to check your decisions, catch biases, and show your work as you go. Attached, you'll find a small section of Signal & Cipher's Creative Generalist framework, which defines creative patterns. These patterns govern how any AI agent responds to a request, providing consistent and predictable context every time. This is just one of over 50 frameworks our agents have access to for operating within our organizational AI OS. So, next time you're tempted to hack your way forward with another fancy prompt, hit pause. Choose one process you control and outline a fundamental framework for it. You'll be amazed by how much more predictable and robust your outcomes become.
-
Honestly, most AI developers are still stuck in the last century. It blows my mind how few people are aware of Error Analysis. This is *literally* the fastest and most effective way to evaluate AI applications, and most teams are still stuck chasing ghosts. Please, stop tracking generic metrics and follow these steps: 1. Collect failure samples Start reviewing the responses generated by your application. Write notes about each response, especially those that were mistakes. You don't need to format your notes in any specific way. Focus on describing what went wrong with the response. 2. Categorize your notes After you have reviewed a good set of responses, take an LLM and ask it to find common patterns in your notes. Ask it to classify each note based on these patterns. You'll end up with categories covering every type of mistake your application made. 3. Diagnose the most frequent mistakes Begin by focusing on the most common type of mistake. You don't want to waste time working with rare mistakes. Drill into the conversations, inputs, and logs leading to those incorrect samples. Try to understand what might be causing the problems. 4. Design targeted fixes At this point, you want to determine how to eliminate the mistakes you diagnosed in the previous step as quickly and cheaply as possible. For example, you could tweak your prompts, add extra validation rules, find more training data, or modify the model. 5. Automate the evaluation process You need to implement a simple process to rerun an evaluation set through your application and evaluate whether your fixes were effective. My recommendation is to use an LLM-as-a-Judge to run samples through the application, score them with a PASS/FAIL tag, and compute the results. 6. Keep an eye on your metrics Each category you identified during error analysis is a metric you want to track over time. You will get nowhere by obsessing over "relevance", "correctness", "completeness", "coherence", and any other out-of-the-box metrics. Forget about these and focus on the real issues you found.