Top LinkedIn Content on Software Performance Optimization

Engineering and AI Leader at Microsoft

49,590 followers 7mo

I am an Engineering Manager with almost 15 years of experience. If I could sit down with a Jr. Software Engineer, here are 25 software engineering cheat codes I would tell them to help them grow in the right direction: If you need lightning-fast lookups in huge datasets → use hashing (hash maps, hash tables) for O(1) access. If your app needs to answer “have I seen this before?” for millions of entries → use a bloom filter, tiny memory, blazing speed, with rare false positives. ➤⎯ If your users complain about slow searches → sort your data and build indexes, searches will go from O(n) to O(log n). If your workload is mostly writing events/logs → make your storage append-only, it’s faster and safer for high-throughput writes. ➤⎯ If you want fast queries on massive databases → use B-trees or LSM trees, they keep lookups quick even with huge data on disk. If you keep loading the same data over and over → implement caching, move frequently accessed data into memory for instant reads. ➤⎯ If you want to avoid downtime or data loss → replicate your data, store copies across machines for high availability. If your search has to handle billions of text records → build inverted indexes, full-text search becomes instant. ➤⎯ If your queries are mostly analytics on columns (not rows) → store data in a columnar format, analytical queries get orders of magnitude faster. If you want scalable storage for time-ordered data (like IoT, monitoring, financial ticks) → use a time-series database, optimized for compression and speed. ➤⎯ If you need to keep storage costs low but can spare CPU cycles → compress your data, smaller footprint, but budget for decompression time. If you want to distribute data evenly and handle scaling painlessly → use consistent hashing, adding/removing nodes won’t reshuffle everything. ➤⎯ If you want to keep a durable audit trail or need fast recovery after crashes → use write-ahead logging, every change gets logged before applying. If your business logic needs efficient range or prefix queries → try trie or segment trees, searches and updates stay fast as data grows. ➤⎯ If your app needs to avoid lock contention and scale with CPUs → use lock-free or concurrent data structures (like skip lists or lockless queues). If you’re handling geographic or multidimensional data → use spatial indexes (like R-trees) for quick location-based searches. ➤⎯ If you want fast string matching for huge texts → use suffix arrays, not as heavy as suffix trees, but still powerful. If you need efficient analytics or rolling stats on numeric data → use Fenwick trees or segment trees, prefix sums and range queries become trivial. ➤⎯ If you’re optimizing for priority queue operations (like job scheduling, Dijkstra’s) → use a heap, fastest way to get min/max elements on demand. If you’re building a collaborative, distributed system → use CRDTs, eventual consistency without conflicts or locks.

145 Comments

Armand Ruiz

building AI systems

204,368 followers 1y

Evaluations —or “Evals”— are the backbone for creating production-ready GenAI applications. Over the past year, we’ve built LLM-powered solutions for our customers and connected with AI leaders, uncovering a common struggle: the lack of clear, pluggable evaluation frameworks. If you’ve ever been stuck wondering how to evaluate your LLM effectively, today's post is for you. Here’s what I’ve learned about creating impactful Evals: 𝗪𝗵𝗮𝘁 𝗠𝗮𝗸𝗲𝘀 𝗮 𝗚𝗿𝗲𝗮𝘁 𝗘𝘃𝗮𝗹? - Clarity and Focus: Prioritize a few interpretable metrics that align closely with your application’s most important outcomes. - Efficiency: Opt for automated, fast-to-compute metrics to streamline iterative testing. - Representation Matters: Use datasets that reflect real-world diversity to ensure reliability and scalability. 𝗧𝗵𝗲 𝗘𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗼𝗳 𝗠𝗲𝘁𝗿𝗶𝗰𝘀: 𝗙𝗿𝗼𝗺 𝗕𝗟𝗘𝗨 𝘁𝗼 𝗟𝗟𝗠-𝗔𝘀𝘀𝗶𝘀𝘁𝗲𝗱 𝗘𝘃𝗮𝗹𝘀 Traditional metrics like BLEU and ROUGE paved the way but often miss nuances like tone or semantics. LLM-assisted Evals (e.g., GPTScore, LLM-Eval) now leverage AI to evaluate itself, achieving up to 80% agreement with human judgments. Combining machine feedback with human evaluators provides a balanced and effective assessment framework. 𝗙𝗿𝗼𝗺 𝗧𝗵𝗲𝗼𝗿𝘆 𝘁𝗼 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗲: 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗬𝗼𝘂𝗿 𝗘𝘃𝗮𝗹 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 - Create a Golden Test Set: Use tools like Langchain or RAGAS to simulate real-world conditions. - Grade Effectively: Leverage libraries like TruLens or Llama-Index for hybrid LLM+human feedback. - Iterate and Optimize: Continuously refine metrics and evaluation flows to align with customer needs. If you’re working on LLM-powered applications, building high-quality Evals is one of the most impactful investments you can make. It’s not just about metrics — it’s about ensuring your app resonates with real-world users and delivers measurable value.

38 Comments

Aishwarya Srinivasan

613,481 followers 8mo

Most people still think of LLMs as “just a model.” But if you’ve ever shipped one in production, you know it’s not that simple. Behind every performant LLM system, there’s a stack of decisions, about pretraining, fine-tuning, inference, evaluation, and application-specific tradeoffs. This diagram captures it well: LLMs aren’t one-dimensional. They’re systems. And each dimension introduces new failure points or optimization levers. Let’s break it down: 🧠 Pre-Training Start with modality. → Text-only models like LLaMA, UL2, PaLM have predictable inductive biases. → Multimodal ones like GPT-4, Gemini, and LaVIN introduce more complex token fusion, grounding challenges, and cross-modal alignment issues. Understanding the data diet matters just as much as parameter count. 🛠 Fine-Tuning This is where most teams underestimate complexity: → PEFT strategies like LoRA and Prefix Tuning help with parameter efficiency, but can behave differently under distribution shift. → Alignment techniques- RLHF, DPO, RAFT, aren’t interchangeable. They encode different human preference priors. → Quantization and pruning decisions will directly impact latency, memory usage, and downstream behavior. ⚡️ Efficiency Inference optimization is still underexplored. Techniques like dynamic prompt caching, paged attention, speculative decoding, and batch streaming make the difference between real-time and unusable. The infra layer is where GenAI products often break. 📏 Evaluation One benchmark doesn’t cut it. You need a full matrix: → NLG (summarization, completion), NLU (classification, reasoning), → alignment tests (honesty, helpfulness, safety), → dataset quality, and → cost breakdowns across training + inference + memory. Evaluation isn’t just a model task, it’s a systems-level concern. 🧾 Inference & Prompting Multi-turn prompts, CoT, ToT, ICL, all behave differently under different sampling strategies and context lengths. Prompting isn’t trivial anymore. It’s an orchestration layer in itself. Whether you’re building for legal, education, robotics, or finance, the “general-purpose” tag doesn’t hold. Every domain has its own retrieval, grounding, and reasoning constraints. ------- Follow me (Aishwarya Srinivasan) for more AI insight and subscribe to my Substack to find more in-depth blogs and weekly updates in AI: https://lnkd.in/dpBNr6Jg

74 Comments

Swami Sivasubramanian

VP, AWS Agentic AI

182,038 followers 2mo

This morning at #AWSreInvent I highlighted new capabilities that are going to really help teams build faster and more efficient AI agents. AWS is putting advanced model customization into the hands of every developer in two ways: 🟠 Reinforcement Fine Tuning (RFT) in Amazon Bedrock helps teams improve model accuracy without needing deep machine learning expertise or large sums of labeled data. Bedrock automates the RFT workflow, making this advanced model customization technique accessible to more developers. RFT on Bedrock also delivers 66% accuracy gains on average over base models, helping you get better results with smaller, faster, more cost-effective models instead of relying on larger, expensive ones. 🟠 Amazon SageMaker AI now supports new serverless model customization capabilities, making model customization possible in just days. With two experiences, your team can choose the right approach for your use case and comfort level. A self-guided approach for those who like to be in the driver's seat, and an agentic-driven experience that uses an AI expert guiding through the whole process. I’m excited for customers to try these capabilities and build agents that deliver faster, more accurate responses at lower costs. More here: https://lnkd.in/gEKiJjK6

12 Comments

Greg Coquillo

Product Leader @AWS | Startup Investor | 2X Linkedin Top Voice for AI, Data Science, Tech, and Innovation | Quantum Computing & Web 3.0 | I build software that scales AI/ML Network infrastructure

224,418 followers 1y

You need to check out the Agent Leaderboard on Hugging Face! One question that emerges in the midst of AI agents proliferation is “which LLMs actually delivers the most?” You’ve probably asked yourself this as well. That’s because LLMs are not one-size-fits-all. While models thrive in structured environments, others don’t handle the unpredictable real world of tool calling well. The team at Galileo🔭 evaluated 17 leading models in their ability to select, execute, and manage external tools, using 14 highly-curated datasets. Today, AI researchers, ML engineers, and technology leaders can leverage insights from Agent Leaderboard to build the best agentic workflows. Some key insights that you can already benefit from: - A model can rank well but still be inefficient at error handling, adaptability, or cost-effectiveness. Benchmarks matter, but qualitative performance gaps are real. - Some LLMs excel in multi-step workflows, while others dominate single-call efficiency. Picking the right model depends on whether you need precision, speed, or robustness. - While Mistral-Small-2501 leads OSS, closed-source models still dominate tool execution reliability. The gap is closing, but consistency remains a challenge. - Some of the most expensive models barely outperform their cheaper competitors. Model pricing is still opaque, and performance per dollar varies significantly. - Many models fail not in accuracy, but in how they handle missing parameters, ambiguous inputs, or tool misfires. These edge cases separate top-tier AI agents from unreliable ones. Consider the below guidance to get going quickly: 1- For high-stakes automation, choose models with robust error recovery over just high accuracy. 2- For long-context applications, look for LLMs with stable multi-turn consistency, not just a good first response. 3- For cost-sensitive deployments, benchmark price-to-performance ratios carefully. Some “premium” models may not be worth the cost. I expect this to evolve over time to highlight how models improve tool calling effectiveness for real world use case. Explore the Agent Leaderboard here: https://lnkd.in/dzxPMKrv #genai #agents #technology #artificialintelligence

15 Comments

Aleksandr Tiulkanov LL.M., CIPP/E

EU AI Act Trainer, ISO/IEC 42001 Implementer, CEN/CENELEC AI Standards Contributor, AI Governance Consultant

12,270 followers 8mo

✨Are reliable and effective AI systems based on large language models (LLMs) possible? Those who have taken my EU AI Act course [1] will remember that I explain the difference between a model and a system using the example of motor vehicles. An AI system is a vehicle, and an AI model is its engine. In both cases, the final product cannot function without a key component. In this new video [2], Sayash Kapoor discusses the challenges of integrating LLMs into a functional AI system. In particular, he points out that: 1) it is necessary to distinguish between the ability of an AI system to perform a task (capability) and the reliability of performing that task; 2) the main problem with large language AI models as an ‘engine’ is that their outputs can be insufficiently predictable (stochastic); 3) accordingly, to solve the problem of the reliability of LLM-based AI systems, it is not enough to just optimise the AI model itself; 4) it is necessary to work on the overall design of the AI system and design in other necessary components in such a way as to contain the inevitable stochasticity of LLMs and ensure the required reliability for the AI system (final product). To my knowledge, there are currently no established and standardised recipes for such work on containing LLMs and achieving reliability at scale for business- or fundamental rights-sensitive AI applications. This is still an area of ongoing research. Keep this in mind next time you are being sold an LLM-based system as a product in which this problem has been successfully solved. Ask the vendor to provide evidence of this work on reliability, and seek independent validation from organisations or people who can critically evaluate the vendor's evidence. [1] https://lnkd.in/gUAmuxqk [2] https://lnkd.in/gbZxZtej

Building and evaluating AI Agents — Sayash Kapoor, AI Snake Oil

https://www.youtube.com/

6 Comments

Nishant Kumar

105,285 followers 3mo

When I started preparing for my Data Engineering switch at IBM, I thought Spark was slow. But the more I worked on real pipelines, the realised one thing: Spark only becomes fast when you write it the right way. Whether you're already in Data Engineering or trying to switch, these optimizations will save you hours of debugging and save thousands in cloud cost. Here are the optimizations that changed the way I build Spark jobs 👇 ➊ Fix Data Skew Early → Identify skewed keys using df.groupBy().count() → Apply salting on heavily repeated keys → Use broadcast joins only when dimension tables are small → Repartition using high-cardinality columns ➋ Optimize Joins → Broadcast joins for small lookup tables → Bucket or sort-merge for repeated joins → Repartition before joining large datasets → Avoid UDFs inside join conditions ➌ Choose Efficient File Formats → Prefer Parquet / ORC over CSV → Use Snappy/ZSTD compression → Enable predicate pushdown → Store data in columnar layout for analytics ➍ Use DataFrames, Not RDDs → Catalyst optimizer = automatic optimizations → Tungsten engine = faster memory management → DataFrames require less code & run faster ➎ Reduce Shuffle Operations → Minimize wide transformations (groupBy, join, orderBy) → Combine transformations where possible → Avoid unnecessary repartitioning → Use coalesce() when reducing partitions And, if you're working on Spark or planning to switch into Data Engineering, focus on learning how to build efficient pipelines, not just code. That’s why I recommend Bosscoder Academy: bcalinks.com/w5Uxf03 You’ll get structured learning, real-world projects, and 1:1 mentorship from engineers who’ve built systems at Deloitte, Walmart, Microsoft etc. P.S. What slows you down more in Spark, shuffles or skew? Follow for more 👋

39 Comments

Andy Werdin

Business Analytics & Tooling Lead | Data Products (Forecasting, Simulation, Reporting, KPI Frameworks) | Team Lead | Python/SQL | Applied AI (GenAI, Agents)

33,341 followers 1y

Have you ever struggled with painfully slow loops in Python DataFrames? Here are some tips on how you can speed things up: 1. 𝗔𝘃𝗼𝗶𝗱 .𝗶𝘁𝗲𝗿𝗿𝗼𝘄𝘀(): • Using .iterrows() is slow because it iterates row by row while creating Series objects every single time becoming a performance bottleneck. • You can use .itertuples() for a small performance gain, as it returns more lightweight namedtuples. 2. 𝗚𝗼 𝗳𝗼𝗿 𝗯𝘂𝗶𝗹𝗱-𝗶𝗻 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 𝗮𝗻𝗱 𝘃𝗲𝗰𝘁��𝗿𝗶𝘇𝗲𝗱 𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝘀: • Aim at using built-in functions of Pandas, which are optimized in C. • Functions like .map() and built-in arithmetic operations can greatly improve performance. 3. 𝗨𝘀𝗲 .𝗮𝗽𝗽𝗹𝘆() 𝗶𝗻𝘀𝘁𝗲𝗮𝗱 𝗼𝗳 𝗹𝗼𝗼𝗽𝘀: • With .apply() you have a more efficient alternative for applying custom functions to your DataFrame. • It’s not as fast as vectorized operations, but it can be a great middle ground. 4. 𝗪𝗼𝗿𝗸 𝗱𝗶𝗿𝗲𝗰𝘁𝗹𝘆 𝗶𝗻 𝗡𝘂𝗺𝗣𝘆: • NumPy is optimized for large array operations and can handle tasks much faster than standard loops. • If you need to use loops a lot, try converting your columns to NumPy arrays and perform calculations directly. Loops can quickly become a performance bottleneck when working with DataFrames. Save yourself time by using performance-optimized approaches instead. What tricks do you use to make your Pandas code run faster? ---------------- ♻️ 𝗦𝗵𝗮𝗿𝗲 if you find this post useful ➕ 𝗙𝗼𝗹𝗹𝗼𝘄 for more daily insights on how to grow your career in the data field #dataanalytics #datascience #python #pandas #careergrowth

33 Comments

Kuldeep Singh Sidhu

Senior Data Scientist @ Walmart | BITS Pilani

15,203 followers 7mo

Exciting New Research on LLM Evaluation Validity! I just read a fascinating paper titled "LLM-Evaluation Tropes: Perspectives on the Validity of LLM-Evaluations" that addresses a critical issue in our field: as Large Language Models (LLMs) increasingly replace human judges in evaluating information retrieval systems, how can we ensure these evaluations remain valid? The paper, authored by researchers from universities and companies across multiple countries (including University of New Hampshire, RMIT, Canva, University of Waterloo, The University of Edinburgh, Radboud University, and Microsoft), identifies 14 "tropes" or recurring patterns that can undermine LLM-based evaluations. The most concerning trope is "Circularity" - when the same LLM is used both to evaluate systems and within the systems themselves. The authors demonstrate this problem using TREC RAG 2024 data, showing that when systems are reranked using the Umbrela LLM evaluator and then evaluated with the same tool, it creates artificially inflated scores (some systems scored >0.95 on LLM metrics but only 0.68-0.72 on human evaluations). Other key tropes include: - LLM Narcissism: LLMs prefer outputs from their own model family - Loss of Variety of Opinion: LLMs homogenize judgment - Self-Training Collapse: Training LLMs on LLM outputs leads to concept drift - Predictable Secrets: When LLMs can guess evaluation criteria For each trope, the authors propose practical guardrails and quantification methods. They also suggest a "Coopetition" framework - a collaborative competition where researchers submit systems, evaluators, and content modification strategies to build robust test collections. If you work with LLM evaluations, this paper is essential reading. It offers a balanced perspective on when and how to use LLMs as judges while maintaining scientific rigor.

sukhad anand

Senior Software Engineer @Google | Techie007 | Opinions and views I post are my own

104,458 followers 3w

Every caching tutorial on the internet is lying to you. They show you this: """ if cache.has(key): return cache.get(key) else: data = db.query(key) cache.set(key, data) return data """ - Looks clean. Works in demos. - Destroys production systems. Here's what actually happens at scale: 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 #𝟭: 𝗧𝗵𝘂𝗻𝗱𝗲𝗿𝗶𝗻𝗴 𝗛𝗲𝗿𝗱 Cache key expires. 10,000 requests hit simultaneously. All 10,000 miss cache. All 10,000 slam your database. Database dies. Cascade failure. Fix: Distributed locks + cache stampede prevention. Only ONE request rebuilds. Others wait or get stale data. 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 #𝟮: 𝗖𝗮𝗰𝗵𝗲 𝗣𝗲𝗻𝗲𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝗔𝘁𝘁𝗮𝗰𝗸 Attacker queries keys that don't exist. user_9999999999, user_9999999998... Cache always misses. Every request hits database. Free DDoS using your own infrastructure. Fix: Bloom filters. Cache negative results. Rate limiting per key pattern. 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 #𝟯: 𝗛𝗼𝘁 𝗞𝗲𝘆 𝗠𝗲𝗹𝘁𝗱𝗼𝘄𝗻 One celebrity posts. Millions request the same cache key. Single Redis node handles ALL traffic. That node melts. Game over. Fix: Key replication with suffixes (key_1, key_2... key_N). Client-side random distribution. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝗹𝗲𝘀𝘀𝗼𝗻: Caching isn't a performance optimization. It's a distributed systems problem disguised as a simple key-value lookup. The moment you add a cache, you've added: -> Consistency challenges -> Failure modes -> Cold start problems -> Memory pressure decisions -> Eviction policy trade-offs " The best cache is the one you understood deeply before deploying.

42 Comments

Software Performance Optimization

Building and evaluating AI Agents — Sayash Kapoor, AI Snake Oil

https://www.youtube.com/

More in Software Performance Optimization

More Technology topics

Explore categories