Top LinkedIn Content on Optimizing Technology Spending

Advisor & Evangelist | CTO | Tech Speaker & Author | AWS

16,110 followers 7mo

You're a #CTO. Your board asks: "What's our ROI on AI coding tools?" Your answer: "40% of our code is AI-generated!" They respond: "So what? Are we shipping faster? Are customers happier?" Most CTOs are measuring AI impact completely wrong. Here's what some are tracking: - Percentage of AI-generated code - Developer hours saved per week - Lines of code produced - AI tool adoption rates These metrics are like measuring how fast your assembly line workers attach parts while ignoring whether your cars actually start. Here's what you SHOULD measure instead: 1. Delivered business value 2. Customer cycle time 3. Development throughput 4. Quality and reliability 5. Total cost of delivery (not just development) 6. Team satisfaction Software development isn't a typing competition—it's a complex system. If AI makes your developers 30% faster but your deployment takes 2 weeks and QA adds another week, your customer delivery improves by maybe 7%. You've speed up the wrong part. The solution: A/B test your teams. Give half your teams AI tools, measure business outcomes over 2-3 release cycles. Track what customers actually experience, not how much developers produce. Companies that measure business impact from AI will pull ahead. Those measuring vanity metrics will wonder why their expensive tools aren't moving the needle. Stop measuring how much code AI generates. Start measuring how much faster you deliver value to customers. What are you actually measuring? And is it moving your business forward? -> Follow me for more about building great tech organizations at scale. More insights in my book "All Hands on Tech"

129 Comments

Dr. Tathagat Varma

Creator of “Theory of Cognitive Chasms”

35,827 followers 5mo Edited

By now, the "95% failure rate" of GenAI financial returns (ref MIT's Project NANDA) is part of all consulting decks. The report blames the incorrect approach as the primary reason, rather than model maturity, etc. The key is to understand what #ROI metrics are used to determine the financial returns. I asked #Copilot on this, and here's what it told me: --- Here are three examples of ROI frameworks that enterprises are using to evaluate and scale GenAI adoption effectively: 1. Business Outcome-Based ROI Framework (Gartner) Summary: Gartner recommends aligning GenAI initiatives with measurable business outcomes such as cost reduction, revenue growth, or productivity gains. For example, a retail company using GenAI for automated product descriptions tracked a 22% increase in conversion rates and a 15% reduction in content creation costs. The framework emphasizes setting baseline metrics before deployment and tracking improvements post-implementation. 🔗 https://lnkd.in/dER7cTeF 2. Time-to-Value and Efficiency Metrics (BCG) Summary: Boston Consulting Group suggests using time-to-value (TTV) and operational efficiency as key ROI indicators. In one case, a logistics firm used GenAI to optimize routing, reducing delivery times by 18% and fuel costs by 12%. BCG’s framework includes pre/post comparisons, automation impact, and employee productivity metrics to quantify GenAI’s contribution. 🔗 https://lnkd.in/da2zcSfW 3. Model Performance vs. Business KPIs (McKinsey) Summary: McKinsey advocates for linking GenAI model performance directly to business KPIs. For instance, a financial services firm used GenAI for customer support automation and tracked resolution time, customer satisfaction scores, and call deflection rates. The framework includes continuous monitoring of model accuracy, relevance, and business impact. 🔗 https://lnkd.in/dA6zEGuS 🔑 Key Message Summary Effective GenAI ROI frameworks combine technical performance metrics with business impact indicators. Leading approaches include tracking cost savings, productivity gains, time-to-value, and alignment with strategic KPIs. Enterprises that define success upfront and monitor outcomes continuously are more likely to scale GenAI successfully. --- The direction taken seems to be well-intentioned. However, the measure of success is not quite what might lead to real solid business outcomes! Individual productivity improvements are just that! They don't scale across the organization unless "vertically scaled" top-to-down an entire process delivering bottomline improvements, which then need to be further "horizontally scaled" end-to-end across the entire value chain of the firm to deliver topline value! My forthcoming book on Cognitive Chasm provides actionable guidance to practitioners on this.

81 Comments

Ken Wong

President, Solutions & Services Group, Lenovo.

45,135 followers 1y

CIOs are leading a transformation focused on strategic, long-term value rather than just adopting the latest tech. 🌍 Lenovo’s Global CIO Study shows 96% of CIOs plan to boost tech investments, focusing on AI and security. From my conversations, it’s clear they’re also thinking about sustainability and future-proofing in a rapidly evolving tech landscape. 💡 However, 61% of CIOs face challenges in proving ROI from these investments, highlighting the need to not only innovate but to deliver measurable outcomes. Here are four strategies to tackle this challenge: 1️⃣ Align Tech Investments with Business Goals Tie each technology decision directly to business outcomes. Whether it’s enhancing customer experience, increasing revenue, or improving operational efficiency, measurable goals make the case for ROI clearer. 2️⃣ Build Cross-functional Alignment Involve key business leaders in the early stages of technology planning. Demonstrating how investments benefit various departments, from marketing to operations, builds stronger support for technology initiatives and ensures alignment with broader company objectives. 3️⃣ Prioritize Long-term Value Creation While short-term wins are important, CIOs must invest in technology that continues to deliver value over time. AI, for instance, plays a pivotal role in future-proofing organizations in a rapidly changing digital landscape. 4️⃣ Leverage Sustainability and Future-of-Work Strategies New growth areas, like sustainability and adapting to the future of work, are top-of-mind for CIOs. AI is central to addressing these trends, from optimizing energy use to enabling more productive environments - key factors in demonstrating ROI over the long term. For me, leading through this transformation isn’t just about adopting AI or new tools. It’s about building a roadmap that is thoughtful and strategic, building a solid foundation today for tomorrow’s growth. How are you navigating your business’s tech transformation to demonstrate ROI? I’d love to hear your insights on the challenges and opportunities. 🤝 #WeAreLenovo #TechTransformation #AI

50 Comments

Zain Hasan

I build and teach AI | AI/ML @ Together AI | EngSci ℕΨ/PhD @ UofT | Previously: vector DBs, data scientist, lecturer & health tech founder | 🇺🇸🇨🇦🇵🇰

18,864 followers 1y

You don't need a 2 trillion parameter model to tell you the capital of France is Paris. Be smart and route between a panel of models according to query difficulty and model specialty! New paper proposes a framework to train a router that routes queries to the appropriate LLM to optimize the trade-off b/w cost vs. performance. Overview: Model inference cost varies significantly: Per one million output tokens: Llama-3-70b ($1) vs. GPT-4-0613 ($60), Haiku ($1.25) vs. Opus ($75) The RouteLLM paper propose a router training framework based on human preference data and augmentation techniques, demonstrating over 2x cost saving on widely used benchmarks. They define the problem as having to choose between two classes of models: (1) strong models - produce high quality responses but at a high cost (GPT-4o, Claude3.5) (2) weak models - relatively lower quality and lower cost (Mixtral8x7B, Llama3-8b) A good router requires a deep understanding of the question’s complexity as well as the strengths and weaknesses of the available LLMs. Explore different routing approaches: - Similarity-weighted (SW) ranking - Matrix factorization - BERT query classifier - Causal LLM query classifier Neat Ideas to Build From: - Users can collect a small amount of in-domain data to improve performance for their specific use cases via dataset augmentation. - Can expand this problem from routing between a strong and weak LLM to a multiclass model routing approach where we have specialist models(language vision model, function calling model etc.) - Larger framework controlled by a router - imagine a system of 15-20 tuned small models and the router as the n+1'th model responsible for picking the LLM that will handle a particular query at inference time. - MoA architectures: Routing to different architectures of a Mixture of Agents would be a cool idea as well. Depending on the query you decide how many proposers there should be, how many layers in the mixture, what the aggregate models should be etc. - Route based caching: If you get redundant queries that are slightly different then route the query+previous answer to a small model to light rewriting instead of regenerating the answer

10 Comments

Colin S. Levy

General Counsel at Malbek | Educator Translating Legal Tech And AI Into Practice | Adjunct Professor | Author

48,429 followers 8mo

Why do so many legal technology implementations fail to deliver their promised value? Too often, legal teams rush to adopt the latest tools without first understanding their actual pain points. Here are the critical steps that separate successful implementations from costly failures: 📊 Start with Discovery, Not Solutions Map your current workflows meticulously. Track how long tasks take, where errors occur, and what frustrates your team most. 🎯 Set Measurable Goals Replace vague aspirations like "improve efficiency" with concrete targets: -Reduce contract turnaround by 30% -Eliminate 50% of manual compliance errors -Increase client intake capacity by 25% These specific metrics give you clear success criteria and help demonstrate ROI to stakeholders. 👥 Embrace Change Management Technology fails when people resist it. Appoint enthusiastic "technology champions" who can provide peer support and bridge the gap between IT and daily users. Their grassroots advocacy often proves more effective than top-down mandates. 🔄 Pilot, Learn, Iterate Test solutions with a small group for 6-8 weeks before full rollout. That same legal department reduced their NDA processing time to 1.5 hours and cut errors by 80% during their pilot. These wins built momentum for broader adoption. Remember: legal technology adoption is about solving real problems, not chasing innovation for its own sake. #legaltech #innovation #law #business #learning

24 Comments

J.R. Storment

Executive Director of the FinOps Foundation (VP/GM at the Linux Foundation), Co-Author of FinOps book(s).

23,492 followers 1y

Over the last 18 months, the FinOps Foundation has seen a dramatic shift in the scope of spending that #FinOps practices manage beyond public cloud. We first explored this anticipated shift in the second edition of Cloud FinOps (pg. 401) where we shared a vision for how we expected the scope of FinOps to expand: to a world where FinOps practices are integrating costs beyond public cloud – from SaaS, to licensing, datacenter, and private cloud – for a more complete picture of cost to drive value-based decision-making across a broader scope of spending. In recent surveys, we are seeing upwards of 70% of practitioners now extending their practice beyond public cloud to other types of technology spend. To reflect this reality, the FinOps Foundation Technical Advisory Council has approved a new element in the FinOps Framework to capture the segments associated with the different types of technology cost and usage data FinOps Practitioners are managing: FinOps Scope. Read more in the new Insights article on the expanded scope of FinOps: https://lnkd.in/gPH3vQEn In some cases, especially for companies “born in the cloud,” FinOps teams are the only technology cost management team in the organization. In other cases, FinOps Practitioners are working alongside Allied Personas (ITAM/ITSM/ITFM/TBM/SAM). But in all cases, FinOps’ success in managing cloud spending has the business asking “Can FinOps keep doing what you’re doing for cloud, AND also do it for X?” While other disciplines report on cost at a chargeback level, they do this for a monthly and quarterly roll-up of financial reporting at the general ledger level. FinOps, by contrast, is leveraging extremely granular cost and usage data at levels for all stakeholders, from engineering, to architecture, to product, to finance, and to executives, enabling them to: - Make information available outside of traditional silos to empower Personas across the organization, beyond Leadership – not just the CFO and CIO. - Enable timely decision-making about technology investment choices in “fixed” and variable Scopes. - Enable collaboration between technology and business teams at the engineering and product level. - Enable Cost Aware Product Decisions by bringing cost considerations earlier into the product development lifecycle. - Optimize, modernize, and automate to create consistency and iteratively improve technology usage and cost. Applying FinOps Capabilities to additional Scopes of spending gives businesses more comprehensive visibility into their technology costs. The goal for organizations is to understand and optimize the cost of offering each individual product or service. The first step is to get complete visibility into the cost of a product or service by pulling together all types of costs associated with delivering it... Read more in the new Insights article on the expanded scope of FinOps: https://lnkd.in/gPH3vQEn

25 Comments

Aishwarya Srinivasan

613,501 followers 9mo Edited

If you’re an AI engineer trying to optimize your LLMs for inference, here’s a quick guide for you 👇 Efficient inference isn’t just about faster hardware, it’s a multi-layered design problem. From how you compress prompts to how your memory is managed across GPUs, everything impacts latency, throughput, and cost. Here’s a structured taxonomy of inference-time optimizations for LLMs: 1. Data-Level Optimization Reduce redundant tokens and unnecessary output computation. → Input Compression: - Prompt Pruning, remove irrelevant history or system tokens - Prompt Summarization, use model-generated summaries as input - Soft Prompt Compression, encode static context using embeddings - RAG, replace long prompts with retrieved documents plus compact queries → Output Organization: - Pre-structure output to reduce decoding time and minimize sampling steps 2. Model-Level Optimization (a) Efficient Structure Design → Efficient FFN Design, use gated or sparsely-activated FFNs (e.g., SwiGLU) → Efficient Attention, FlashAttention, linear attention, or sliding window for long context → Transformer Alternates, e.g., Mamba, Reformer for memory-efficient decoding → Multi/Group-Query Attention, share keys/values across heads to reduce KV cache size → Low-Complexity Attention, replace full softmax with approximations (e.g., Linformer) (b) Model Compression → Quantization: - Post-Training, no retraining needed - Quantization-Aware Training, better accuracy, especially <8-bit → Sparsification: - Weight Pruning, Sparse Attention → Structure Optimization: - Neural Architecture Search, Structure Factorization → Knowledge Distillation: - White-box, student learns internal states - Black-box, student mimics output logits → Dynamic Inference, adaptive early exits or skipping blocks based on input complexity 3. System-Level Optimization (a) Inference Engine → Graph & Operator Optimization, use ONNX, TensorRT, BetterTransformer for op fusion → Speculative Decoding, use a smaller model to draft tokens, validate with full model → Memory Management, KV cache reuse, paging strategies (e.g., PagedAttention in vLLM) (b) Serving System → Batching, group requests with similar lengths for throughput gains → Scheduling, token-level preemption (e.g., TGI, vLLM schedulers) → Distributed Systems, use tensor, pipeline, or model parallelism to scale across GPUs My Two Cents 🫰 → Always benchmark end-to-end latency, not just token decode speed → For production, 8-bit or 4-bit quantized models with MQA and PagedAttention give the best price/performance → If using long context (>64k), consider sliding attention plus RAG, not full dense memory → Use speculative decoding and batching for chat applications with high concurrency → LLM inference is a systems problem. Optimizing it requires thinking holistically, from tokens to tensors to threads. Image inspo: A Survey on Efficient Inference for Large Language Models ---- Follow me (Aishwarya Srinivasan) for more AI insights!

64 Comments

Vin Vashishta

AI Strategist | Monetizing Data & AI For The Global 2K Since 2012 | 3X Founder | Best-Selling Author

207,981 followers 5mo

Having a lot of data isn’t the same thing as having high-value data. If you’re having a hard time explaining that to executive leaders, try a different approach. Teach them how to put a dollar value on the business’s data. Every curated dataset creates new opportunities for the business, and that’s the connection between data and profit. The simplest data valuation method is called ‘With & Without’. The business thinks that every dataset creates the same value, so I run an early experiment to disprove that assumption. I turn off access to datasets that stakeholders believe are high value and wait for the complaints to roll in. In most cases, no one notices. Three months later, I propose putting the dataset into cold storage. Business leaders push back, saying their teams would grind to a halt without access to those datasets. I tell them about the experiment. Now I can start a rational conversation about connecting data to use cases and putting a dollar value on each dataset. Data doesn’t create value for two reasons: 1️⃣ It’s incomplete. The data required to support the use case isn’t being gathered holistically. Sometimes that’s an accessibility issue. Other times, the use case, workflow, and outcomes aren’t understood well enough to know what data is necessary. 2️⃣ It lacks context. Data points aren’t enough to support use cases. Context about the process, product, person, intent, and outcome is required. Until data is gathered contextually, its value creation is limited. Connecting datasets with opportunities creates the justification for changing how the business gathers and leverages data. Putting a dollar value on contextual datasets quantifies the ROI of information architecture and engineering initiatives. That’s the shortest path to getting budget and buy-in. Quantify value in terms that business leaders care about and show them a clear connection with outcomes they believe are essential.

33 Comments

Deepak Goyal

𝗢𝗻 𝗮 𝗠𝗶𝘀𝘀𝗶𝗼𝗻 𝘁𝗼 𝗺𝗮𝗸𝗲 𝟭𝟬𝟬+ 𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 𝗶𝗻 𝗻𝗲𝘅𝘁 𝟰𝟱 𝗗𝗮𝘆𝘀

259,661 followers 3mo

Last month, one of my corporate training clients asked me a question that made me pause: "Deepak, we're spending USD 50000 monthly on Databricks. Is there ANY way to bring this down without migrating everything?" Here's the brutal truth about cloud data platforms that nobody talks about: You're paying for compute you don't always need. Let me explain: When you run a query on Databricks, it spins up a cluster, processes data, and charges you for every second of compute time. Even simple aggregations that could run on a smaller engine get routed to the same heavy Spark cluster. It's like using a truck to deliver a pizza. Overkill. Expensive. So what's the solution? After evaluating multiple approaches, I found something interesting, workload-aware query routing. The idea is simple: Not every query needs the full power of Databricks. Some queries can run on cheaper engines (like DuckDB, Postgres, or even your existing data warehouse) and only the heavy transformations hit Databricks. I recently tested this with Zetaris, a federated query layer that sits on top of your existing data stack. Here's what happened in a 2-week pilot: - 40% reduction in Databricks compute costs - Zero migration, it worked with their existing lakehouse - Queries automatically routed to the most cost-efficient engine - No duplicate storage or data movement (zero-copy federation) The engineering team didn't have to rewrite pipelines. No rearchitecture. Just smarter query routing. Why am I sharing this? Because if you're running Databricks, Synapse, or Snowflake at scale, your compute bill is probably your second-biggest cloud expense after storage. And most companies don't realize they can optimize this WITHOUT a massive migration project. Three things you can try: 1. Audit your query patterns, identify which workloads are compute-heavy vs lightweight 2. Explore federated query layers that route intelligently 3. Run a cost benchmark, Zetaris has a 2-week pilot program that shows exactly where you're overspending And if you want to see the actual cost savings report from the pilot, Zetaris has a PDF case study showing the 40-60% reduction numbers, worth checking out. 👉 Try Zetaris open-source on GitHub: https://bit.ly/4mGo0W4 👉 Book a demo if you're running enterprise workloads: https://lnkd.in/dpbXEhAA 👉 Follow Zetaris on LinkedIn for more cost optimization insights

10 Comments

Sven Lackinger

CEO at Sastrify | Transparency & Cost Savings on Software | Making IT and Procurement Leaders happy.

13,453 followers 1y

73% of SaaS vendors, including giants like Zoom and Microsoft, increased their prices last year, pushing software inflation to 8.7% — double the US headline inflation rate. This trend has software costs skyrocketing. But why the spike? 1/ Major players like Google Workspace and Salesforce announced price hikes of up to 20%. 2/ A shift in discount strategies has made costs even less predictable. 3/ Over half of vendors complicate budget planning by keeping enterprise pricing hidden. So what can you do about it? 1/ Start with visibility: You can't fix what you can't see. Start with an audit of your SaaS spend and usage to identify underutilised contracts or tool duplications. (You can do this in a few minutes by connecting your SSO to Sastrify.) 2/ Negotiate wisely: Don't accept price increases without a fight. Use insights from your audit to renegotiate terms or consolidate apps. 3/ Benchmark everything: Use SaaS pricing data to accurately benchmark your current contracts and offers against the market. 4/ Cultivate cost-awareness: Encourage a culture where every software purchase is scrutinised for its cost-benefit and duplication potential. Modern businesses must be more vigilant and strategic in their software investments, and adopting a proactive stance can lead to significant savings and more sustainable growth. #SaaS #procurement #purchasing

4 Comments

Optimizing Technology Spending

Evolving and Leading Tech Transformation

Explore categories

Optimizing Technology Spending

Evolving and Leading Tech Transformation

More in Optimizing Technology Spending

More Technology topics

Explore categories