Gartner said 40% of agentic AI projects will be cancelled by 2027. Everyone calls it a model problem. It isn't. We are not getting closer to 2027. The model was the least broken thing in the system. Almost every time, I found the real failures cluster in 6 buckets. Only 1 is about the LLM. Here are these buckets: 1️⃣ 𝗗𝗮𝘁𝗮 𝗤𝘂𝗮𝗹𝗶𝘁𝘆: The agent hallucinates because the source of truth doesn't exist. Not because the model is bad. 2️⃣ 𝗣𝗿𝗼𝗺𝗽𝘁 & 𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻: Brittle prompts. No versioning. No regression suite. Output drifts. Nobody can tell you why. 3️⃣ 𝗠𝗼𝗱𝗲𝗹 𝗖𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝘆: Yes, this is real. But it's #3. Not #1. 4️⃣ 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶on: Tool calls fail silently. Steps run out of order. The "agent" is a Rube Goldberg machine wearing autonomy as a costume. 5️⃣ 𝗘𝘀𝗰𝗮𝗹𝗮𝘁𝗶𝗼𝗻 𝗟𝗼𝗴𝗶𝗰: The agent doesn't know when it doesn't know. It keeps plowing through tasks that should have gone to a human three steps ago. Almost like an “ego” 6️⃣ 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗚𝗮𝗽: You can't tell if the agent got better or worse this week. So you can't fix anything systematically. You just patch. Calling this "a model problem" lets a lot of people/processes off the hook. The LLM vendor sells you the next model. The platform team avoids the orchestration debt. Leadership keeps funding pilots that were architecturally doomed on day one. Agent drift is the new model drift. It's a lifecycle problem. Not a capability problem. You don't fix it by upgrading the model. You fix it by building the system around the model. #ExperienceFromTheField #WrittenByHuman
Training Evaluation Models
Explore top LinkedIn content from expert professionals.
-
-
𝐓𝐡𝐞 𝐝𝐚𝐲 𝐚 𝐬𝐚𝐥𝐞𝐬 𝐭𝐞𝐚𝐦 𝐰𝐞𝐧𝐭 “𝐨𝐟𝐟 𝐰𝐨𝐫𝐤. It was a Friday evening The sales floor was quiet Systems shut down, chairs empty, reports closed But here’s the truth: 👉 A sales team never really goes off work. Because their performance echoes in numbers, in client relationships, in revenue lines & even after office hours. Then why is it that Learning & Development is often treated as an afterthought? Why is training considered an “expense” instead of the same “investment” lens applied to hiring? 🔹 Hiring looks expensive. 🔹 Training looks cheap. Yet without training, even the most expensive hires struggle — and companies keep wondering why the numbers don’t move. Case in point: We recently worked with a CRM team in real estate. Pain points were clear: Long talk-time with clients High escalations Poor feedback scores Through a structured 6-week learning journey, we helped the team: ✔ Reduce average talk-time by 17% ✔ Cut escalation cases by 23% ✔ Improve feedback ratings from 3.2 to 4.5 That’s not “soft skills.” That’s ROI in action. 3 takeaways for trainers & brands: 1. Quantify impact — Always tie your program to numbers leaders care about. 2. Diagnose before you deliver — Pain areas first, modules later. 3. Sell transformation, not training — No one buys sessions. They buy outcomes. 𝐈𝐟 𝐲𝐨𝐮’𝐫𝐞 𝐚 𝐜𝐨𝐦𝐩𝐚𝐧𝐲 𝐥𝐨𝐨𝐤𝐢𝐧𝐠 𝐭𝐨 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐬𝐞𝐞 𝐑𝐎𝐈 𝐟𝐫𝐨𝐦 𝐋&𝐃 , 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐭𝐢𝐜𝐤 𝐚 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐛𝐨𝐱 ... 𝐥𝐞𝐭’𝐬 𝐜𝐨𝐧𝐧𝐞𝐜𝐭 We’ve done it across 1,100+ brands, and we’d love to do it for you.
-
I really appreciated the following study, which evaluated the efficacy of a culturally adapted cognitive behavioural therapy for depression in the United Arab Emirates and the way they reported it. Not only was this paper strong evidence for the use of CBT in such contexts, but it also provided substantial information about how they adapted the modality and the depth of consideration of what needed to change. https://lnkd.in/gm4HZMP4 Recognizing the importance of culture, they ensured that therapists received training on the cultural values, family dynamics, and religious beliefs of their sample communities (Arab and Filipino), and the intervention explicitly acknowledged and integrated these aspects. For example, automatic negative thoughts were discussed within religiously familiar contexts to help them reframe seeking help as an act compatible with their religious principles. Therapists also explored issues related to family honor and helped them see how therapy could be viewed as a tool that would allow them to restore family well-being and achieve their other personal goals. Culturally relevant examples, metaphors, and stories were also used, and even the activities suggested for common CBT elements, such as behavioural activation, were adjusted to include tasks with which participants would be familiar. Lastly, and perhaps my favourite part of the modification was the inclusion of family members as people who could help support the psychoeducation process while also teaching them how to foster a supportive home environment. While the control group was only a treatment-as-usual one, this is a significant step toward demonstrating how culturally oriented CBT can make a difference, as the CBT group had significantly lower depressive symptoms after the intervention period. However, I would have appreciated it if they had compared it with an unadapted version of CBT to tease out further the differences introduced by the cultural adaptations. Nevertheless, this was an interesting read, and I highly encourage people to consider how CBT can be adapted, especially across different cultural contexts. #mentalhealth #psychology #psychiatry #wellness #mentalillness
-
The question of how to measure skills is one that educators have grappled with for years. Often, it’s meant relying on proxy metrics to define success. Hours spent learning. Qualifications gained. Important, but still improveable. Yes, completion rates matter. But they encourage you to limit who gets access to learning based on who is likely to complete, rather than who can benefit. And if you’re an employer waiting to the end of a programme to find out if you’ve got ROI, then you should demand better. The fundamental question for any leadership team: is this investment of time and money delivering a tangible return to the business? So in addition to that, at Multiverse, we’ve shifted the focus from time spent learning to value created. Our quarterly impact numbers are grounded in the actual work our apprentices do. Every project submitted on the Multiverse platform represents someone applying new skills to a real challenge in their organisation. That's what we measure, and that's what we report. In 2026 so far, our apprentices have reported monthly ROI of: - 325,000 hours of time saved - £240 million in saved or avoided costs - £40 million in increased revenue In a world where every budget line is being scrutinised, “we think it's working” isn't good enough. This is the data I come back to when I want to know whether we're actually delivering on that. Real outcomes, from real apprentices, doing real work. And if you're a customer, we'll show you exactly what this looks like for your organisation. If you can't demonstrate the direct return on your talent development spend, you're essentially guessing. We think you deserve better than that. Ultimately, this is what true accountability looks like in skills development. We are proving that when you equip your workforce with the right technical tools, the result is a measurable and scalable surge in productivity.
-
Talent & Culture Bytes 15 – Overcoming Cultural Barriers in the Workplace Cultural nuances can profoundly impact the way employees communicate and challenge each other, often leading to significant and far-reaching effects on work outcomes. A particularly insightful and eye-opening example is the experience of Korean Air. In the late 1990s, Korean Air faced an alarming situation with multiple plane crashes. Persistent cockpit miscommunication between the pilot and co-pilot was a significant factor in these tragic accidents. In-depth investigations uncovered a surprising cause. In landing an airplane, especially under challenging conditions such as severe weather, seamless communication between the pilot and co-pilot is critically important. Airplanes are designed to be operated by two equals working together. However, South Korea ranks very high on Hofstede’s Power Distance Index, indicating a very hierarchical culture in which subordinates find it very difficult to question or challenge their superiors. This deep-seated respect for hierarchy led to a situation in which the pilot was in full command, and everyone else, including the co-pilot, was highly deferential and would not challenge the pilot. In 2000, Korean Air hired David Greenberg from Delta Air Lines to lead their flight operations. He recognized the profound difficulties resulting from culturally-driven communication barriers. Consequently, he mandated that all Korean Air pilots become fluent and speak only in English. Speaking English enabled the flight crews to break free from the South Korean cultural legacy that restricted their ability to confront one another, especially their superiors. Also, it was easier to speak to connect with Air Traffic control across different countries when landing. Moreover, Greenberg introduced training programs that encouraged subordinates to take a more active and assertive role. Co-pilots were trained to speak up, challenge the pilot when required, and apply critical thinking and assertiveness. The efforts resulted in Korean Air’s safety standards soon being rated among the highest in the world, and it maintained an impeccable record for both customer satisfaction and safety. The remarkable example of Korean Air demonstrates how changing the language of communication can enable an organization to transcend cultural legacies and pave the way for a new future. As we reflect upon this example, let us consider the following questions: * How can we identify and address communication barriers rooted in cultural norms? * What steps can we take to encourage a more open and assertive communication culture in our organizations? #talent #culture #hr #talentmanagement #talentandculturebytes #rkbytes
-
⏳ Deadlines vs. Relationships: The Hidden Reason Your Global Team Is Stuck 🌍 Your global team is not struggling because people don’t care. They’re struggling because they’ve been taught different rules for what professionalism looks like. And when those rules clash, even strong teams lose momentum. ⚠️ If you lead across cultures, you’ve likely seen it: Some team members value speed, efficiency, and clear deadlines 📊 Others value trust, rapport, and relationship-building first 🤝 Neither approach is wrong. But when task-oriented cultures (often the U.S., Germany, Switzerland) work with relationship-oriented cultures (often Latin America, the Middle East, and parts of Asia), friction can show up quickly. It often looks like this: ✨ One person thinks, “Let’s get to the point.” ✨ Another thinks, “Why are we rushing before trust is built?” ✨ One sees direct feedback as efficient. ✨ Another experiences it as disrespectful. The result? Misalignment, frustration, and stalled execution. 😓 And it costs you: Slower project execution ⏱️ Misinterpreted feedback 💬 Reduced psychological safety ⚠️ Frustrated high performers 🔥 Cross-cultural research suggests multicultural teams can outperform homogeneous ones — but only when differences are understood and managed well. So how do you reduce conflict and improve global team performance? Here’s what works: 1️⃣ Make cultural expectations explicit. At the start of a project, define what your team means by efficiency, responsiveness, trust, and professionalism. What feels clear in one culture may feel cold or vague in another. 2️⃣ Build relationship time into the process early. In many cultures, trust is not separate from work — it is what makes work possible. A little intentional connection early can prevent major friction later. 🌱 3️⃣ Clarify timelines, roles, and decision-making. Don’t assume everyone interprets urgency, ownership, or deadlines the same way. Spell it out clearly so fewer assumptions derail execution. 🧭 4️⃣ Coach both directness and diplomacy. Global teams need both clarity and tact. The goal is not to make everyone communicate the same way — it’s to help them communicate effectively across differences. 🗣️ 5️⃣ Develop Cultural Intelligence (CQ). When tension happens, pause and ask: “Is this a skills issue — or a cultural difference?” That question can shift blame into insight. 💡 When leaders learn to balance deadlines and relationships, something changes: Meetings get smoother. Trust builds faster. Collaboration gets easier. Projects move with fewer setbacks. 🚀 That’s the real global leadership advantage. If you’re leading across cultures and feeling this tension, let’s talk. 👉 Schedule a call with me to strengthen your team’s cultural competence and global collaboration strategy. #MasteringCulturalDifferences #CrossCulturalCommunication #CulturalCompetence #InclusiveLeadership #GlobalTeams #TeamPerformance #LeadershipDevelopment
-
Exciting New Research on LLM Evaluation Validity! I just read a fascinating paper titled "LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations" that addresses a critical issue in our field: as Large Language Models (LLMs) increasingly replace human judges in evaluating information retrieval systems, how can we ensure these evaluations remain valid? The paper, authored by researchers from universities and companies across multiple countries (including University of New Hampshire, RMIT, Canva, University of Waterloo, The University of Edinburgh, Radboud University, and Microsoft), identifies 14 "tropes" or recurring patterns that can undermine LLM-based evaluations. The most concerning trope is "Circularity" - when the same LLM is used both to evaluate systems and within the systems themselves. The authors demonstrate this problem using TREC RAG 2024 data, showing that when systems are reranked using the Umbrela LLM evaluator and then evaluated with the same tool, it creates artificially inflated scores (some systems scored >0.95 on LLM metrics but only 0.68-0.72 on human evaluations). Other key tropes include: - LLM Narcissism: LLMs prefer outputs from their own model family - Loss of Variety of Opinion: LLMs homogenize judgment - Self-Training Collapse: Training LLMs on LLM outputs leads to concept drift - Predictable Secrets: When LLMs can guess evaluation criteria For each trope, the authors propose practical guardrails and quantification methods. They also suggest a "Coopetition" framework - a collaborative competition where researchers submit systems, evaluators, and content modification strategies to build robust test collections. If you work with LLM evaluations, this paper is essential reading. It offers a balanced perspective on when and how to use LLMs as judges while maintaining scientific rigor.
-
Measuring the ROI of Virtual Behavioral Training Investing in behavioral training is not just about cost—it’s about measurable impact. The real question organizations must ask is: Does the training deliver a return on investment (ROI) in terms of improved retention, productivity, and leadership effectiveness? In our previous analysis, the total cost of a two-day virtual behavioral training for 60 mid-level managers was ₹19,63,000. Now, let’s calculate the potential ROI based on key business outcomes. 1. ROI Formula The standard formula for training ROI: ROI (%) = {Monetary Benefits} - {Training Cost}/ {Training Cost} * 100 2. Business Impact Assumptions To estimate the monetary benefits, we consider three key areas: A) Reduction in Attrition Average attrition for mid-level managers: 15% annually Assumed reduction in attrition due to training: 3 percentage points Average cost of replacing a manager (hiring, onboarding, productivity loss): ₹15,00,000 per manager Retention improvement: 60 managers × 3% = 1.8 managers saved {Cost Savings from Reduced Attrition} = 1.8*15,00,000 = ₹27,00,000 B) Increased Promotions & Internal Mobility Assumed impact: 5% increase in internal promotions Cost of hiring an external manager: ₹20,00,000 (recruitment, ramp-up, lost productivity) Savings from internal promotion: 60 × 5% = 3 managers promoted {Cost Savings from Internal Promotions} = 3* 20,00,000 = ₹60,00,000 C) Productivity Gains from Behavioral Improvement Behavioral training enhances leadership, communication, and decision-making, leading to improved productivity. Assumed productivity increase: 2% per manager Average annual contribution per manager (₹30L salary, assuming 3× salary as productivity value): ₹90,00,000 Total productivity gain per manager: ₹90,00,000 × 2% = ₹1,80,000 Total impact: ₹1,80,000 × 60 managers = ₹1,08,00,000 3. Total Monetary Benefit Benefit Area and Financial Impact Reduction in Attrition 27,00,000 Increased Internal Promotions 60,00,000 Productivity Gains 1,08,00,000 Total Benefits 1,95,00,000 4. ROI Calculation ROI (%) = {1,95,00,000 - 19,63,000}/{19,63,000} * 100 ROI = {1,75,37,000}/{19,63,000} * 100 ROI = 892% 5. Strategic Takeaways: Why This Matters High ROI Justifies Investment: An 892% ROI confirms that investing in behavioral training yields substantial business value. Retention and Internal Mobility Drive Cost Savings: Avoiding attrition and promoting from within reduces hiring costs significantly. Productivity Gains Create Long-Term Impact: Even small behavioral shifts in leadership and decision-making lead to tangible business outcomes. By linking training costs to measurable business benefits, organizations can move beyond cost discussions to strategic impact measurement—ensuring learning investments drive organizational growth. Would love to hear from others.
-
🚀 TNA in Training: Turning Insight into Impact A strong training program doesn’t start with content it starts with clarity. The Training Needs Analysis (TNA) Framework helps organizations design learning initiatives that truly drive performance and business results. 🔹 Organizational Analysis: Align training with strategic goals 🔹 Task Analysis: Identify required skills and competencies 🔹 Individual Analysis: Assess current performance levels 🔹 Gap Analysis: Spot the difference between where you are and where you need to be 🔹 Solution Identification: Choose the right training or intervention 🔹 Evaluation & Feedback: Measure effectiveness and refine When applied correctly, TNA transforms training from a routine activity into a strategic advantage. 💡 Training is not an expense, it’s an investment in capability, growth, and future success. #LearningAndDevelopment #TNA #TrainingStrategy #TalentDevelopment #PerformanceImprovement
-
Evaluating LLMs is hard. Evaluating agents is even harder. This is one of the most common challenges I see when teams move from using LLMs in isolation to deploying agents that act over time, use tools, interact with APIs, and coordinate across roles. These systems make a series of decisions, not just a single prediction. As a result, success or failure depends on more than whether the final answer is correct. Despite this, many teams still rely on basic task success metrics or manual reviews. Some build internal evaluation dashboards, but most of these efforts are narrowly scoped and miss the bigger picture. Observability tools exist, but they are not enough on their own. Google’s ADK telemetry provides traces of tool use and reasoning chains. LangSmith gives structured logging for LangChain-based workflows. Frameworks like CrewAI, AutoGen, and OpenAgents expose role-specific actions and memory updates. These are helpful for debugging, but they do not tell you how well the agent performed across dimensions like coordination, learning, or adaptability. Two recent research directions offer much-needed structure. One proposes breaking down agent evaluation into behavioral components like plan quality, adaptability, and inter-agent coordination. Another argues for longitudinal tracking, focusing on how agents evolve over time, whether they drift or stabilize, and whether they generalize or forget. If you are evaluating agents today, here are the most important criteria to measure: • 𝗧𝗮𝘀𝗸 𝘀𝘂𝗰𝗰𝗲𝘀𝘀: Did the agent complete the task, and was the outcome verifiable? • 𝗣𝗹𝗮𝗻 𝗾𝘂𝗮𝗹𝗶𝘁𝘆: Was the initial strategy reasonable and efficient? • 𝗔𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻: Did the agent handle tool failures, retry intelligently, or escalate when needed? • 𝗠𝗲𝗺𝗼𝗿𝘆 𝘂𝘀𝗮𝗴𝗲: Was memory referenced meaningfully, or ignored? • 𝗖𝗼𝗼𝗿𝗱𝗶𝗻𝗮𝘁𝗶𝗼𝗻 (𝗳𝗼𝗿 𝗺𝘂𝗹𝘁𝗶-𝗮𝗴𝗲𝗻𝘁 𝘀𝘆𝘀𝘁𝗲𝗺𝘀): Did agents delegate, share information, and avoid redundancy? • 𝗦𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗼𝘃𝗲𝗿 𝘁𝗶𝗺𝗲: Did behavior remain consistent across runs or drift unpredictably? For adaptive agents or those in production, this becomes even more critical. Evaluation systems should be time-aware, tracking changes in behavior, error rates, and success patterns over time. Static accuracy alone will not explain why an agent performs well one day and fails the next. Structured evaluation is not just about dashboards. It is the foundation for improving agent design. Without clear signals, you cannot diagnose whether failure came from the LLM, the plan, the tool, or the orchestration logic. If your agents are planning, adapting, or coordinating across steps or roles, now is the time to move past simple correctness checks and build a robust, multi-dimensional evaluation framework. It is the only way to scale intelligent behavior with confidence.