Few Lessons from Deploying and Using LLMs in Production Deploying LLMs can feel like hiring a hyperactive genius intern—they dazzle users while potentially draining your API budget. Here are some insights I’ve gathered: 1. “Cheap” is a Lie You Tell Yourself: Cloud costs per call may seem low, but the overall expense of an LLM-based system can skyrocket. Fixes: - Cache repetitive queries: Users ask the same thing at least 100x/day - Gatekeep: Use cheap classifiers (BERT) to filter “easy” requests. Let LLMs handle only the complex 10% and your current systems handle the remaining 90%. - Quantize your models: Shrink LLMs to run on cheaper hardware without massive accuracy drops - Asynchronously build your caches — Pre-generate common responses before they’re requested or gracefully fail the first time a query comes and cache for the next time. 2. Guard Against Model Hallucinations: Sometimes, models express answers with such confidence that distinguishing fact from fiction becomes challenging, even for human reviewers. Fixes: - Use RAG - Just a fancy way of saying to provide your model the knowledge it requires in the prompt itself by querying some database based on semantic matches with the query. - Guardrails: Validate outputs using regex or cross-encoders to establish a clear decision boundary between the query and the LLM’s response. 3. The best LLM is often a discriminative model: You don’t always need a full LLM. Consider knowledge distillation: use a large LLM to label your data and then train a smaller, discriminative model that performs similarly at a much lower cost. 4. It's not about the model, it is about the data on which it is trained: A smaller LLM might struggle with specialized domain data—that’s normal. Fine-tune your model on your specific data set by starting with parameter-efficient methods (like LoRA or Adapters) and using synthetic data generation to bootstrap training. 5. Prompts are the new Features: Prompts are the new features in your system. Version them, run A/B tests, and continuously refine using online experiments. Consider bandit algorithms to automatically promote the best-performing variants. What do you think? Have I missed anything? I’d love to hear your “I survived LLM prod” stories in the comments!
Product Value Creation
Explore top LinkedIn content from expert professionals.
-
-
Most people still think of LLMs as “just a model.” But if you’ve ever shipped one in production, you know it’s not that simple. Behind every performant LLM system, there’s a stack of decisions, about pretraining, fine-tuning, inference, evaluation, and application-specific tradeoffs. This diagram captures it well: LLMs aren’t one-dimensional. They’re systems. And each dimension introduces new failure points or optimization levers. Let’s break it down: 🧠 Pre-Training Start with modality. → Text-only models like LLaMA, UL2, PaLM have predictable inductive biases. → Multimodal ones like GPT-4, Gemini, and LaVIN introduce more complex token fusion, grounding challenges, and cross-modal alignment issues. Understanding the data diet matters just as much as parameter count. 🛠 Fine-Tuning This is where most teams underestimate complexity: → PEFT strategies like LoRA and Prefix Tuning help with parameter efficiency, but can behave differently under distribution shift. → Alignment techniques- RLHF, DPO, RAFT, aren’t interchangeable. They encode different human preference priors. → Quantization and pruning decisions will directly impact latency, memory usage, and downstream behavior. ⚡️ Efficiency Inference optimization is still underexplored. Techniques like dynamic prompt caching, paged attention, speculative decoding, and batch streaming make the difference between real-time and unusable. The infra layer is where GenAI products often break. 📏 Evaluation One benchmark doesn’t cut it. You need a full matrix: → NLG (summarization, completion), NLU (classification, reasoning), → alignment tests (honesty, helpfulness, safety), → dataset quality, and → cost breakdowns across training + inference + memory. Evaluation isn’t just a model task, it’s a systems-level concern. 🧾 Inference & Prompting Multi-turn prompts, CoT, ToT, ICL, all behave differently under different sampling strategies and context lengths. Prompting isn’t trivial anymore. It’s an orchestration layer in itself. Whether you’re building for legal, education, robotics, or finance, the “general-purpose” tag doesn’t hold. Every domain has its own retrieval, grounding, and reasoning constraints. ------- Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg
-
I recently spent time getting more hands-on with LLM & Agentic AI engineering through Ed Donner's training. Instead of stopping at examples, I built a mini multi-agent logistics delivery optimization framework. Building real AI systems quickly makes one thing clear: 𝙏𝙝𝙚 𝙝𝙖𝙧𝙙 𝙥𝙖𝙧𝙩 𝙞𝙨𝙣’𝙩 𝙩𝙝𝙚 𝙢𝙤𝙙𝙚𝙡 — 𝙞𝙩’𝙨 𝙩𝙝𝙚 𝙖𝙧𝙘𝙝𝙞𝙩𝙚𝙘𝙩𝙪𝙧𝙚 𝙙𝙚𝙘𝙞𝙨𝙞𝙤𝙣𝙨 𝙖𝙧𝙤𝙪𝙣𝙙 𝙞𝙩. A few practical lessons: 1. 𝗟𝗟𝗠 𝗺𝗼𝗱𝗲𝗹 𝘀𝗲𝗹𝗲𝗰𝘁𝗶𝗼𝗻 𝗶𝘀 𝗳𝗮𝗿 𝗺𝗼𝗿𝗲 𝗻𝘂𝗮𝗻𝗰𝗲𝗱 𝘁𝗵𝗮𝗻 𝗰𝗼𝘀𝘁 𝘃𝘀 𝗹𝗮𝘁𝗲𝗻𝗰𝘆. Trade-offs: • reasoning maturity for complex planning • context window & memory strategy • proprietary models vs smaller open models • infra costs (GPU/hosting) vs token-based API costs • tool-calling reliability & structured output adherence • benchmark performance vs real task behavior • model stability across releases In practice, it becomes a hybrid strategy: 𝘀𝗺𝗮𝗹𝗹𝗲𝗿/𝗰𝗵𝗲𝗮𝗽𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 𝗳𝗼𝗿 𝗿𝗼𝘂𝘁𝗶𝗻𝗲 𝘁𝗮𝘀𝗸𝘀 + 𝗦𝗟𝗠 𝘄𝗶𝘁𝗵 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 𝗳𝗼𝗿 𝗱𝗼𝗺𝗮𝗶𝗻 𝗽𝗿𝗼𝗯𝗹𝗲𝗺𝘀 + 𝘀𝘁𝗿𝗼𝗻𝗴𝗲𝗿 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗺𝗼𝗱𝗲𝗹𝘀 𝗳𝗼𝗿 𝗰𝗼𝗺𝗽𝗹𝗲𝘅 𝗱𝗲𝗰𝗶𝘀𝗶𝗼𝗻𝘀. 𝟮. 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗮𝘀 𝗺𝘂𝗰𝗵 𝗮𝘀 𝘁𝗵𝗲 𝗟𝗟𝗠: Many AI demos over-engineer the stack. In reality, simplicity, latency, security and reliability matter more than novelty. • Use orchestration frameworks only where coordination complexity exists • Combine prompts with structured outputs to reduce ambiguity • Watch serialization and tool-call overhead — they impact latency and UX • Reduce unnecessary LLM calls when deterministic code can solve the task Besides lowering token cost, this improves context efficiency, letting models focus on real reasoning. Sometimes best architecture decision is 𝙣𝙤𝙩 𝙞𝙣𝙩𝙧𝙤𝙙𝙪𝙘𝙞𝙣𝙜 𝙖𝙣𝙤𝙩𝙝𝙚𝙧 𝙡𝙖𝙮𝙚𝙧. 3. 𝗕𝗶𝗴𝗴𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 ≠ 𝗯𝗲𝘁𝘁𝗲𝗿 𝗼𝘂𝘁𝗰𝗼𝗺𝗲𝘀 Smaller models with fine-tuning on domain data can perform more consistently than larger ones. Fine-tuning helps when: • tasks are repetitive but require precision • domain vocabulary is specialized • prompts become fragile But 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 𝗮𝗹𝘀𝗼 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗲𝘀 𝗹𝗶𝗳𝗲𝗰𝘆𝗰𝗹𝗲 𝗼𝘃𝗲𝗿𝗵𝗲𝗮𝗱. Base model upgrades trigger retesting and partial rewrites. 4. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝗴𝗮𝗽: 𝗽𝗿𝗼𝘁𝗼𝘁𝘆𝗽𝗲 → 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 Demos are easy. Production requires 𝙚𝙫𝙖𝙡𝙪𝙖𝙩𝙞𝙤𝙣 𝙛𝙧𝙖𝙢𝙚𝙬𝙤𝙧𝙠𝙨, 𝙤𝙗𝙨𝙚𝙧𝙫𝙖𝙗𝙞𝙡𝙞𝙩𝙮, 𝙨𝙚𝙘𝙪𝙧𝙞𝙩𝙮, 𝙥𝙚𝙧𝙛𝙤𝙧𝙢𝙖𝙣𝙘𝙚, 𝙘𝙤𝙨𝙩 𝙜𝙤𝙫𝙚𝙧𝙣𝙖𝙣𝙘𝙚 & 𝙜𝙪𝙖𝙧𝙙𝙧𝙖𝙞𝙡𝙨. That’s where most engineering effort goes. 𝟱. 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗳𝗼𝗿 𝗹𝗲𝗮𝗱𝗲𝗿𝘀 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 𝗔𝗜 𝗽𝗿𝗼𝗴𝗿𝗮𝗺𝘀 Many AI conversations focus on SDLC productivity- Useful but the bigger opportunity is 𝙧𝙚𝙞𝙢𝙖𝙜𝙞𝙣𝙞𝙣𝙜 𝙡𝙚𝙜𝙖𝙘𝙮 𝙗𝙪𝙨 𝙥𝙧𝙤𝙘𝙚𝙨𝙨𝙚𝙨 𝙪𝙨𝙞𝙣𝙜 𝘼𝙜𝙚𝙣𝙩𝙞𝙘 AI. By simply automating existing steps, we risk making inefficient tasks efficient and missing the real transformation.
-
Head of Product, this is your secret weapon: Curiosity. Stepping into a role as Head of Product in an organization where field experts and clients have historically driven product decisions can be daunting. So how do you add value and empower your team in such an environment? Start by bridging gaps in domain knowledge. Encourage your product managers to shadow field experts. This allows them to see the real-world application of products and understand user challenges firsthand. Your PMs should be asking "What do our clients do with our products? What problems are they trying to solve?" Fostering curiosity in your team is key. A great product manager doesn't have to walk in the door with extensive domain expertise. But they should be hungry to learn and understand (if that's missing, you might not have the right fit for your product team). But how can you drive curiosity? Start by developing processes. Encourage the team to conduct user research and set up programs where they can interact directly with customers. Before kicking off projects, make customer interviews a regular part of the discovery phase. Also, consider pairing PMs with a field expert mentor for continuous learning. Don’t forget to lead by example. Learn the domain alongside your team if needed. Your role is to create the structure that facilitates learning and knowledge-sharing, ensuring your product managers engage with users and field experts regularly. Ultimately, the drive to understand and solve user problems is what makes a product manager successful. As a leader, your focus should be on fostering this mindset and setting the stage for continuous learning. That's how you add value and ensure your team can effectively build products that resonate with your users. Are you fostering curiosity in your teams? How? Let’s share techniques in the comments.
-
Everyone says the future of product management is AI-native. But what the hell does it mean to be an AI-native PM? After watching our instructors teach thousands of students at Maven and observing my own team's transformation, I think it comes down to two layers. 1. The technical layer If you want to build AI-first products, you need to know how they work. • AI fundamentals. What an LLM actually is, the trade-offs of using something like RAG, when to use agents (one or multiple), and what evals are. You need to speak the language fluently enough to collaborate with engineers without a translator. • Model intuition and selection. When to fine-tune, how cost and intelligence scales with model size. • AI product sense. AI products have fundamentally different requirements. A mediocre AI experience is worse than no AI experience at all. You need to understand guardrails, failure modes, and how to design for non-determinism. 2. The productivity layer PMs should use AI as a second nature part of their day-to-day work. For existing PMs, this requires shifting their workflows entirely... • Prototyping. Instead of PRDs, start by using tools like Cursor or Claude Code to ship and iterate on prototypes and feature demos. • Research and insights. Use LLMs to synthesize data of all types (not just CSVs) into usable insights. Read the original data to ensure accuracy and deeply understand the context the LLM is presenting. • Strategy and writing. You still do your own thinking, while leveraging AI to fill in the gaps. AI can produce excellent docs and decompose them into tasks given enough context and prompting, but it shouldn't make the final decisions. • Personal software. Use tools like Claude to build small apps and tools that only you use, optimized entirely for your specific workflows and use cases. Taste and judgement still matter the same as they did before. PMs are still expected to be the CEO of their products. But they also need to be natively using AI in their work, and deeply understand the opportunities to build AI-driven products. P.S. BTW we’re partnering with Lenny Rachitsky to launch a new series of free lessons called “The AI-Native Product Manager”. Check it out: https://bit.ly/4s0mYYj • The CTO of MySpace turned ML Product Lead at Google, Dmitry Shapiro, on how to best use Clawdbot as a PM • The 1st Product Manager, v0 at Vercel, Ary Khandelwal, on how PMs can build and *deploy* code with no handoff • Ex-Head of UXR, Spotify Business, Caitlin Sullivan, on when and how to construct synthetic data for product discovery • The former CPO of LinkedIn, Tomer Cohen, on becoming a full stack builder with AI • Former Director of Growth at Gitlab, Hila Qu 曲卉, on the The AI-powered VP of Growth playbook • Former FDE Lead at Palantir and Citadel, Vinoo Ganesh, on building products like a forward deployed engineer • Product Lead at Roblox, Peter Yang, on AI Powered Product Skills for Executive Leaders & GMs
-
Product managers & designers working with AI face a unique challenge: designing a delightful product experience that cannot fully be predicted. Traditionally, product development followed a linear path. A PM defines the problem, a designer draws the solution, and the software teams code the product. The outcome was largely predictable, and the user experience was consistent. However, with AI, the rules have changed. Non-deterministic ML models introduce uncertainty & chaotic behavior. The same question asked four times produces different outputs. Asking the same question in different ways - even just an extra space in the question - elicits different results. How does one design a product experience in the fog of AI? The answer lies in embracing the unpredictable nature of AI and adapting your design approach. Here are a few strategies to consider: 1. Fast feedback loops : Great machine learning products elicit user feedback passively. Just click on the first result of a Google search and come back to the second one. That’s a great signal for Google to know that the first result is not optimal - without tying a word. 2. Evaluation : before products launch, it’s critical to run the machine learning systems through a battery of tests to understand in the most likely use cases, how the LLM will respond. 3. Over-measurement : It’s unclear what will matter in product experiences today, so measuring as much as possible in the user experience, whether it’s session times, conversation topic analysis, sentiment scores, or other numbers. 4. Couple with deterministic systems : Some startups are using large language models to suggest ideas that are evaluated with deterministic or classic machine learning systems. This design pattern can quash some of the chaotic and non-deterministic nature of LLMs. 5. Smaller models : smaller models that are tuned or optimized for use cases will produce narrower output, controlling the experience. The goal is not to eliminate unpredictability altogether but to design a product that can adapt and learn alongside its users. Just as much as the technology has changed products, our design processes must evolve as well.
-
Clayton Christensen announced it — product managers are underestimating the disruption caused by Large Language Models (LLMs) for the reasons described in The Innovator's Dilemma. Incumbent organizations often focus on what new technologies CANNOT do, highlighting their limitations and risks instead of embracing the low-cost and scalability benefits that are emerging. Every profession has an implicit Return On Investment (ROI). If you're rejecting LLMs because they can only accomplish tasks with 80% quality, you're missing the point. A machine that can accomplish 80% of a task (= return) with merely 1% of the effort (= investment) offers a much much much better ROI than a human everything manually. Adding to this, there exists an absurd subconscious belief among some product managers that their lack of adoption will somehow slow down the inevitable tsunami of disruption. Combined with natural organizational inertia, this mindset results in a profession that clings to internal debates—such as the distinction between a product manager and a product owner—when it should be focusing on learning how to surf this lava-wave. Product managers should be obsessed with: 1. Breaking down their jobs into huge lists of tiny tasks; 2. Exploring how each task could be done slightly more rapidly thanks to LLMs; 3. Figuring out what new investments or habits need to happen to accelerate the tango — starting by abandoning ChatGPT and hopping onto LLMs that tap into private databases, your most important asset moving forward. Here's the beautiful part: LLMs are an amazing piece of technology, but the actual products remain to be invented on top of it. What's holding you back?
-
“𝗝𝘂𝘀𝘁 𝗱𝗮𝘁𝗮” doesn’t build billion-dollar companies. That’s the reason Airbnb probably still exists today. In their first year of operations, the data they had been collating was clear—𝗘𝗻𝗴𝗮𝗴𝗲𝗺𝗲𝗻𝘁 𝗳𝗿𝗼𝗺 𝗵𝗼𝘀𝘁𝘀 𝘄𝗮𝘀 𝗹𝗼𝘄. Most product teams would’ve hit the panic button, tweaked the funnel, run A/B tests, or thrown money at performance ads. But Brian, Nathan, and Joe did something different. They booked a flight to New York. Because though they knew that the dashboards weren't lying, they also knew that they weren't telling the full story. Once on the ground, they met their hosts face-to-face, walked into their homes, and looked at their listings. Instantly, they saw the real issue: 𝘁𝗲𝗿𝗿𝗶𝗯𝗹𝗲 𝗽𝗵𝗼𝘁𝗼𝘀. Not a product flaw. Not a pricing problem. Just bad images. So, they rolled up their sleeves, hired professional photographers, and even started clicking photos themselves. Almost overnight, engagement soared. And Airbnb found its first real turning point. If they had trusted only the data, they would’ve fixed the wrong problem and Airbnb would’ve probably disappeared. This story is a masterclass in product leadership. 𝗗𝗮𝘁𝗮 𝗶𝘀 𝗮 𝗰𝗼𝗺𝗽𝗮𝘀𝘀, 𝗻𝗼𝘁 𝗮 𝗺𝗮𝗽. It points to where you need to look, but it won't tell you what's actually there. That’s why the best product decisions don’t just come from dashboards alone. And great product leaders know when to follow the numbers—𝗮𝗻𝗱 𝘄𝗵𝗲𝗻 𝘁𝗼 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 𝘁𝗵𝗲𝗺. When to dig deeper, ask better questions, and lean into human insight. Because building great products isn’t just about 𝗪𝗛𝗔𝗧 people do. It’s also about understanding 𝗪𝗛𝗬 they do it. It’s about blending - - Data + insight. - Logic + empathy. - AI + human Have you ever had to go beyond the data to follow a gut instinct? I'd love to hear how it played out. #productmanagement #growth #airbnb #brianchesky
-
The companies that grow the fastest scale their experimentation programs. These are the 3 keys: 1. Trustworthy experiments 2. Institutional memory 3. Data culture Let me explain each. — PILLAR 1: TRUSTWORTHY EXPERIMENTS Three challenges block trust. Here’s how to solve them: Challenge 1: Outlier Customers One enterprise client can skew data like 200 average users. Results warp. You build for the 1%, not the majority. Solution: Use stratified sampling. Balance test groups by customer size. Turn outliers into insights, not noise. Challenge 2: Novelty Effects Week 1 shows amazing results. By Week 6, you're back to baseline. This classic trap wastes months on temporary wins. Solution: Track metrics over weeks, not days. Create holdout groups to measure true impact. Don't celebrate until you see sustained value. Challenge 3: Consistency Issues Different teams get contradictory results. Trust collapses. Progress stalls. Solution: Standardize methodology across teams. Create unified playbooks. — PILLAR 2: INSTITUTIONAL MEMORY Most companies run experiments but fail to build lasting knowledge. Here are the 3 elements you need: Element 1: Batting Average View Track your success rate (industry average: 33%). Measure your average lift (typically 8%). Focus on high-probability experiments instead of random testing. Element 2: Frictionless Documentation Documentation fails when it's manual work. Automate capturing rationale, setup, and results. When documentation is automatic, it actually happens. Element 3: Cross-Team Learning Growth, marketing, product—each runs valuable experiments. Insights often die in silos. Build shared repositories. New hires gain years of wisdom instantly. — PILLAR 3: DATA CULTURE Even perfect experiments fail without the right cultural foundation. These 3 elements create that foundation: Element 1: Standardized Definitions Create a metrics dictionary everyone follows: Revenue = Monthly recurring revenue only Engagement = Sessions >2 min with 3+ page views When everyone measures the same way, results become comparable. Element 2: Truth Over Gaming Value right actions over being right. Create safe spaces for negative results. Element 3: Statistical Literacy Help teams understand error margins. Separate signal from noise. No advanced degrees required. Just enough knowledge to make good decisions. — LEARN MORE In my deepdive (free, no paywall thanks to Statsig): https://lnkd.in/etAGf7Nu — THE BOTTOM LINE The cost of not building this system? Testing the same ideas repeatedly. Forgetting what you've learned. Seeing competition pull ahead. What pillar do you need to focus on?
-
I saw a job posting for an AI PM at Figma yesterday, and it highlights why "vibe-launching" LLM products is not enough to become an AI PM. Anyone can build an LLM-Wrapper over the weekend, but it's not enough to be an AI PM at companies like Figma, Google, Microsoft, Anthropic, and so on... The reality is, this role was never just about prompting; it’s about owning the Machine Learning lifecycle. I see a lot of aspiring AI PMs focus purely on the "creative" side of GenAI, but if you look closely at these job descriptions, they are asking for three very specific, very technical skills that define the role in 2026: 1. Beyond the "Black Box" (LLMs & ML Fundamentals) Figma asks to "prioritize model improvements." You can't do that if you don't understand what's happening under the hood. For example: 🤖 LLMs (RAG vs. Fine-Tuning): If your chatbot fails, is it a Retrieval (RAG) issue (showed the wrong doc) or a Fine-Tuning issue (wrong tone)? If you don't know the difference, you can spend too much time 'fixing' the wrong thing. 📊 Traditional ML: Think about a Netflix Recommendation System. If it recommends movies you hate, it’s likely a data issue—maybe the model only trained on your weekend habits. You need to understand how Data Collection and Training work so you can spot these bias issues before they ruin the user experience. 2. Owning the "Definition of Good" (Evals & Metrics) In traditional software, a bug is a bug. In AI, "quality" is subjective—and that is terrifying for a roadmap. That’s why you see requirements for "experience with evaluation and iteration." 🥇 LLMs (Golden Datasets): You have to move beyond "it feels good". You need to learn how to build Golden Datasets—essentially a set of ground-truth examples that you define as the perfect answers. When engineering updates the model, you run it against this dataset. If the score drops, you don't launch. 🎯 Traditional ML (Context): You need to understand why an 80% Precision score might be great for a music recommendation, but 90% could be a total disaster for a fraud detection model. 3. Scaling (Reliability & MLOps) Making a demo work for one person is easy. Scaling to 10,000 is hard. When companies ask for "scaling experience," they are talking about the unsexy stuff: Latency, Cost, and Reliability. You need to get familiar with the MLOps landscape—tools like LangSmith or Arize for tracing errors, or Datadog for monitoring latency. ---- The biggest hurdle isn't Python. It's moving from Deterministic code (If A, then B) to Probabilistic outcomes (If A, then probably B). It changes how you think about roadmaps and how you manage user expectations when you can't guarantee a specific output 100% of the time. 👋 If you’re trying to move into an AI PM role, what's the biggest challenge you are facing? --- 💎 I’ve been an AI PM for 6+ years. If you want to dive deeper into AI Product Management, check my comment below for resources!