How AI can Advance Without Larger Models

Explore top LinkedIn content from expert professionals.

Summary

AI can advance without larger models by focusing on smarter training, specialized tasks, and efficient architecture rather than simply increasing size. Small Language Models (SLMs) use fewer parameters but excel in speed, privacy, and adaptability, allowing AI to run locally and handle specific jobs well without high costs or complexity.

  • Choose smaller models: Deploy specialized, compact models for routine tasks to reduce expenses and improve privacy, especially in mobile or edge settings.
  • Refine training methods: Improve learning by using high-quality, reasoning-rich data and adding verification steps during training to boost accuracy and reliability.
  • Tailor for workflow: Break down AI systems so that different tasks are assigned to models best suited for them, rather than relying on a single large model for everything.
Summarized by AI based on LinkedIn member posts
  • View profile for Andreas Horn

    Head of AIOps @ IBM || Speaker | Lecturer | Advisor

    245,046 followers

    IBM 𝗷𝘂𝘀𝘁 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗲𝗱 𝗚𝗿𝗮𝗻𝗶𝘁𝗲-𝟰.𝟬 𝗡𝗮𝗻𝗼 (𝟯𝟱𝟬𝗠 & 𝟭𝗕) - 𝗮 𝗻𝗲𝘄 𝗳𝗮𝗺𝗶𝗹𝘆 𝗼𝗳 𝗰𝗼𝗺𝗽𝗮𝗰𝘁 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝗱𝗲𝘀𝗶𝗴𝗻𝗲𝗱 𝗳𝗼𝗿 𝗵𝗶𝗴𝗵 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗮𝘁 𝘀𝗺𝗮𝗹𝗹 𝘀𝗰𝗮𝗹𝗲. Both models demonstrate very strong performance in instruction-following and tool-calling capabilities, and can even run 100 % locally in your browser via WebGPU acceleration. Built specifically for agentic workflows, Granite-4.0 Nano opens a new chapter for small, efficient models that perform reliably on the edge. 𝗛𝗲𝗿𝗲 𝗮𝗿𝗲 𝘁𝗵𝗲 𝗸𝗲𝘆 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀: → Hybrid Mamba-2 / Transformer architecture → 70% less memory usage → 2× faster inference → Optimized for multi-session and long-context tasks → Built for edge deployment → Apache 2.0 license A bigger model isn’t always the better or the right paradigm. In real-world deployments, it’s just as important to optimize for latency, efficiency, and adaptability – because speed and cost often outweigh sheer size. Most AI agents handle repetitive, well-defined tasks such as parsing, routing, tool calls, and summarization. They don’t need an all-knowing large model but a fast, fine-tuned small model that executes precisely and efficiently, getting the job done as quickly as possible. It seems clear to me that Small Language Models (SLMs) are becoming a core part of future AI workflows. The race to run capable models smoothly on edge devices and in multi-agent systems is accelerating fast. As model quality continues to improve – as seen with Granite-4.0 Nano – SLMs are proving that efficiency, not size, will define the next phase of AI deployment. There’s a clear and growing market for them. 𝗟𝗶𝗻𝗸𝘀 𝗶𝗳 𝘆𝗼𝘂 𝘄𝗮𝗻𝘁 𝘁𝗼 𝗱𝗶𝗴 𝗶𝗻: Blog: https://lnkd.in/eFss5YFi Hugging Face: https://lnkd.in/eUdGVQAj Ollama: https://lnkd.in/em9ynmbC Docker: https://lnkd.in/g8Ntzhgp Unsloth: https://lnkd.in/gx6CEqjt 𝗣.𝗦. 𝗜 𝗿𝗲𝗰𝗲𝗻𝘁𝗹𝘆 𝗹𝗮𝘂𝗻𝗰𝗵𝗲𝗱 𝗮 𝗻𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿 𝘄𝗵𝗲𝗿𝗲 𝗜 𝘄𝗿𝗶𝘁𝗲 𝗮𝗯𝗼𝘂𝘁 𝗔𝗜 + 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀. 𝗜𝘁’𝘀 𝗳𝗿𝗲𝗲, 𝗮𝗻𝗱 𝗮𝗹𝗿𝗲𝗮𝗱𝘆 𝗿𝗲𝗮𝗱 𝗯𝘆 𝟮𝟱𝗸+ 𝗽𝗲𝗼𝗽𝗹𝗲: https://lnkd.in/dbf74Y9E

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    633,628 followers

    For a long time, many companies built AI systems around a simple idea: choose the most powerful large language model available and use it across the entire workflow. One large model handling classification, summarization, routing, reasoning, and generation. What I am seeing now, especially going into 2026, is a clear architectural shift. Teams are moving away from the “one giant model does everything” approach. Instead, they are decomposing workflows and assigning different models to different layers of the system. Smaller, more specialized models are being used for well-defined tasks, while larger models are reserved for complex reasoning where their breadth actually matters. For those who are newer to this space, a SLMs typically refers to a model in the 1B to 12B parameter range. These models are optimized for efficiency, lower latency, and narrower domains. They are not designed to replace frontier-scale models, but to handle specific tasks extremely well. There are two practical reasons why I believe 2026 will be a high-adoption year for SLMs: ✦ Cheaper, faster, and more customizable For tasks like classification, structured extraction, lightweight reasoning, or domain-specific summarization, a smaller model is often more than sufficient. It runs with lower latency, costs less to scale, and if it is open source, it can be fine-tuned and adapted to your internal data and workflows. That level of customization gives teams real control over performance and differentiation. ✦ On-device and edge intelligence As more AI moves closer to the user, on-device and edge inference become critical. Mobile assistants, IoT systems, and privacy-sensitive enterprise applications cannot always rely on sending every request to a large cloud model. Small models make local inference feasible, improving both responsiveness and privacy. Large models are still essential for open-ended reasoning and complex generation. But the most mature systems will not rely on a single model. They will be orchestrated systems, where each model is chosen based on what it is best at. Model size is no longer the strategy, architecture is.

  • View profile for Pau Labarta Bajo

    Building and teaching AI that works > Maths Olympian> Father of 1.. sorry 2 kids

    70,570 followers

    I used to think that bigger is better in LLM engineering. I was wrong. Let me explain ⬇️ 𝗧𝗵𝗲 "𝗕𝗶𝗴𝗴𝗲𝗿 𝗶𝘀 𝗯𝗲𝘁𝘁𝗲𝗿" 𝗺𝗮𝗻𝘁𝗿𝗮 We (developers of the world) have been told so many times that “Bigger models are better” that we have internalised the mantra. No questions. Zero. Cero. Zéro Nula. ноль. 零 ज़ीरो We've been conditioned to believe "bigger models = better results." Every time we need to build an LLM-powered feature or product, we follow these steps: > go to OpenAI/DeepSeek/Anthropic/Gemini/whatever-LLM-provider-you-like > pick up the latest/larges/shiniest Large Language Model they offer > set up the billing (if it is your first time using that API) > generate a new API key, > uv install whatever Python SDK we need to talk to the API (e.g uv install anthropic), and… BOOM! We start streaming tokens from a data center the size of Luxembourg (and paying good cash for it) And the thing is, this approach works well for demos and small projects, but does NOT work when you are building real-world apps and features that need to be used by real people, because > Your API bills explode with user growth > Privacy regulations block third-party model usage > Latency kills user experience > Offline scenarios become impossible 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆 𝗰𝗵𝗲𝗰𝗸 Most AI problems don't need GPT-4 level power. It's like taking a plane to go grocery shopping. Overkill. 𝗧𝗵𝗲 𝘀𝗺𝗮𝗿𝘁𝗲𝗿 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵? Small Language Models. I just proved this by building a chess app with a tiny 350M parameter model by Liquid AI that: ✅ Runs 100% on mobile devices ✅ Works offline anywhere ✅ Zero ongoing cloud costs ✅ Complete data privacy (You can find a link to the code on the comment section below this post) 𝗧𝗵𝗲 𝗻𝗲𝘄 𝗔𝗜 𝗺𝗮𝗻𝘁𝗿𝗮 Smaller, task-specific models are often better because they're cheaper, faster, private, and more accurate for your actual use case. While everyone's chasing the biggest models, smart developers are building with Small Language Models that solve real problems without breaking the bank. The edge is where (most of day-to-day) AI's future lives. What would you build if you could run LLMs natively on mobile? Please let me know on the comments 👇 ---- Hi there, I am Pau! Glad you got to the end of the post (who reads these days?!) I share tips and tricks about building AI products that work. If you want to get more of these posts and have a higher signal-to-noise ratio in your feed, 𝗴𝗶𝘃𝗲 𝗺𝗲 𝗮 𝗳𝗼𝗹𝗹𝗼𝘄 Pau Labarta Bajo!

  • View profile for Arockia Liborious
    Arockia Liborious Arockia Liborious is an Influencer
    39,520 followers

    AI's Next Breakthrough Won't Come From More Data It will come from better learning loops Frontier AI is slowly and steadily moving past the phase where progress came mainly from scaling harder. While this approach still seems to work, its magic is seemingly fading. There's only so much high-quality data available online and we're already feeling that limit. What's changing is how learning happens and not how big models get. The next gains will come from extracting more signal per unit of compute - focusing on quality, not just quantity. — Pretraining that favors reasoning-dense data over raw volume. i.e. we're no longer scaling input tokens, we're scaling how much learning signal each unit of compute produces. — Reinforcement learning as a second scaling axis, learn by doing and practicing not just by reading. — Value functions that help models course correct early instead of wasting computes. e.g. you don't need to execute the full code to realize the approach is wrong. — Verifiers, tests, and formal checks that turn correctness into a training signal. Generate many trajectories → verify → distill correctness back into the model. — Most importantly, we now know that the real careful post training after the initial training is what makes them truly reliable. This is also where generalization matters more than benchmarks: learning from fewer examples, adapting continuously, and failing less strangely under real-world conditions. We can't just keep building bigger AI models. The real opportunity now is making them smarter learners. The Practical Shift: From relying on massive data to focusing on quality learning - Shift from "more data" to "better data". Prioritise reasoning traces, error-fix pairs etc. - Bake verification into training. Use tests, compilers, specs, and policy checks as learning signals. - Add early-warning signals. Train value functions so models abandon bad paths early instead of wasting long runs. - Refine like a product: Treat the final "polishing" phase as a core product. - Measure what matters. Track generalization, sample efficiency, and robustness under real-world shifts, not just benchmarks. Scale made AI possible but learning efficiency will decide what comes next. #AI

  • View profile for Alexandru Voica

    Public affairs | AI, social media, interactive entertainment, online retail, semiconductors, consumer electronics.

    7,554 followers

    For years, the faith was simple: make the models bigger, and progress will arrive on schedule. The compute-rich made progress, while everyone else watched from the cheap seats. Today, K2 Think crashes that party. At 32 billion parameters, MBZUAI (Mohamed bin Zayed University of Artificial Intelligence) and G42’s new open-source reasoning model posts math results that rival far larger flagship systems from OpenAI and DeepSeek. The recipe is simple: efficient reasoning by design. Long chain-of-thought supervised fine-tuning, reinforcement learning with verifiable rewards, agentic planning, and test-time scaling to push depth without bloating the base. On benchmarks, K2 Think matches the mathematical reasoning performance of proprietary frontier models across AIME ’24/’25, HMMT ’25, and OMNI-Math-HARD—again, at 32B. In fact, K2 Think is the top open-source model for complex math benchmarks matching or exceeding previously leading models that are orders of magnitude greater in size. If you’ve been telling yourself that only mega-models can navigate hard problems, consider this a polite correction. For me, the accessibility story is the real unlock. All of a sudden, anyone with a few hundred GPUs and high-quality data can fine-tune or extend this model across new domains, new tools, new constraints, without signing a cloud contract that looks like a sovereign debt offering. And with a Cerebras Systems wafer-scale path plus speculative decoding hitting 2,000 tokens per second, per request, the “fast and frugal” crowd just got some hardware tailwind. The market message is cleaner: the moat is narrowing. If performance parity is possible at 1/20th of the size, then capital and talent will migrate toward smarter training regimes, not just bigger clusters. Welcome, then, to the post-size era: smarter beats fatter, open beats opaque, and progress comes from ideas you can run (and rerun) without a trillion-parameter trust fund.

  • View profile for Kuldeep Singh Sidhu

    Senior Data Scientist @ Walmart | BITS Pilani

    16,488 followers

    Exciting New Research Alert: Small Language Models Are Proving Their Worth! A groundbreaking survey from Amazon researchers reveals that Small Language Models (SLMs) with just 1-8B parameters can match or even outperform their larger counterparts. Here's what makes this fascinating: Technical Innovations: - SLMs like Mistral 7B implement grouped-query attention (GQA) and sliding window attention with rolling buffer cache to achieve performance equivalent to 38B parameter models - Phi-1, with just 1.3B parameters trained on 7B tokens, outperforms models like Codex-12B (100B tokens) and PaLM-Coder-540B through high-quality "textbook" data - TinyLlama (1.1B) leverages Rotary Positional Embedding, RMSNorm, and SwiGLU activation functions to match larger models on key benchmarks Architecture Breakthroughs: - Hybrid approaches like Hymba combine transformer attention with state space models in parallel layers - Qwen models use enhanced tokenization (152K vocabulary) with untied embedding and FP32 precision RoPE - Novel quantization and pruning techniques enable deployment on mobile devices Performance Highlights: - Gemini Nano (1.8B-3.25B parameters) shows exceptional capabilities in factual retrieval and reasoning - Orca 13B achieves 88% of ChatGPT's performance on reasoning tasks - Phi-4 surpasses GPT-4-mini on mathematical reasoning The research demonstrates that with optimized architectures, high-quality training data, and innovative techniques, smaller models can deliver impressive performance while being more efficient and deployable. This is a game-changer for organizations looking to implement AI solutions with limited computational resources. The future of AI might not necessarily be about building bigger models, but smarter ones.

  • View profile for Professor Shafi Ahmed

    Surgeon | Futurist | Innovator | Entrepreneur | Humanitarian | Intnl Keynote Speaker

    58,797 followers

    The paper “Small Language Models are the Future of Agentic AI” makes a provocative argument: the future of intelligent software agents will not be built on today’s vast and expensive large language models (LLMs), but instead on smaller, more efficient models. The authors, from NVIDIA and Georgia Institute of Technology, define small language models (SLMs) as systems compact enough to run on consumer-grade devices with low latency, typically with fewer than ten billion parameters as of 2025. Unlike their heavyweight LLM cousins, SLMs are lightweight, nimble, and inexpensive to operate. The central claim is that these smaller models are not just “good enough” for most real-world agentic applications; in many cases, they are actually better suited. The reasoning rests on the nature of agentic AI tasks. Most software agents do not spend their time solving grand philosophical puzzles. Instead, they perform repeated, narrowly defined tasks, such as summarising emails, parsing documents, running queries, or automating workflows. For such repetitive, well-scoped activities, an oversized LLM is wasteful, akin to hiring a Nobel laureate to perform simple bookkeeping. Small models, properly tuned, can deliver the same functionality at far lower cost and with faster response times. The paper does not dismiss the role of LLMs altogether. There are still situations, particularly those requiring broad conversational flexibility or complex reasoning, where a large model remains indispensable. But the authors suggest a heterogeneous architecture: a swarm of specialised SLMs handling most of the workload, with larger models reserved for the exceptional cases. This hybrid approach, they argue, combines the best of both worlds: efficiency and adaptability. They are candid about the obstacles to such a shift. The AI industry has invested heavily in the LLM ecosystem—technically, commercially, and culturally. Many companies are now locked into infrastructure and business models optimised for LLMs, making change difficult. To help overcome this, the paper proposes a general conversion framework, a kind of roadmap for developers who want to adapt their existing LLM-based agents into SLM-powered systems without losing functionality. The economic and operational implications are striking. A partial migration from LLMs to SLMs could dramatically reduce costs, improve latency, and make agentic AI viable on devices at the edge, such as laptops, phones, or embedded systems, rather than requiring centralised cloud resources. In other words, smaller models may democratise the field by making intelligent agents cheaper, faster, and more widely accessible. The authors end with an open call to the research community. They do not present their thesis as the last word, but rather as the beginning of a dialogue around the central idea that small, specialised language models could drive the next wave of agentic AI. https://lnkd.in/gecggtJc

  • View profile for Romano Roth
    Romano Roth Romano Roth is an Influencer

    Group Chief AI Officer @ Zühlke | Helping CEOs, CTOs & CIOs turn AI ambition into an operating model: feedback loops, governance, and execution across people, process, technology | Author | Lecturer | Speaker

    18,863 followers

    𝗔𝗜 𝗠𝗼𝗱𝗲𝗹𝘀 𝗔𝗿𝗲 𝗚𝗲𝘁𝘁𝗶𝗻𝗴 𝗗𝘂𝗺𝗯𝗲𝗿. 𝗕𝗲𝗰𝗮𝘂𝘀𝗲 𝗪𝗲’𝗿𝗲 𝗠𝗮𝗸𝗶𝗻𝗴 𝗧𝗵𝗲𝗺 𝗦𝗺𝗮𝗿𝘁𝗲𝗿 𝘁𝗵𝗲 𝗪𝗿𝗼𝗻𝗴 𝗪𝗮𝘆 Large Language Models (LLMs) like ChatGPT still use the same expensive computation to recall simple facts (“What’s the capital of France?”) as they do to solve a physics proof. This is like recalculating 2+2 from scratch every time you need it. It’s not just inefficient, it’s structurally broken. 𝗘𝗻𝗴𝗿𝗮𝗺: 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗠𝗲𝗺𝗼𝗿𝘆 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀 Engram splits the model’s responsibilities like the human brain: - Neocortex: reasoning (Transformer backbone) - Hippocampus: memory lookup (Engram module) Instead of overloading neural layers with static knowledge, Engram lets LLMs look it up in O(1) from an external memory, just like we do. 𝗧𝗵𝗲 𝗥𝗲𝘀𝘂𝗹𝘁𝘀? 𝗦𝘂𝗿𝗽𝗿𝗶𝘀𝗶𝗻𝗴. Tested on a 27B-parameter model, Engram outperformed a same-sized MoE baseline on: - General reasoning (BBH: +5.0) - Long-context (Multi-Query NIAH: 84.2 → 97.0) - Code & math (HumanEval: +3.0) - Factual tasks (CMMLU: +4.0) - <3% latency overhead, even with 100B parameters offloaded to host memory. No compute bottlenecks. Just smarter design. 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆: We don’t need bigger models. We need better brains. By offloading memorization to memory, Engram frees up the model’s attention for actual reasoning. It’s an architectural shift, not just an optimization. Watch this space. This is a glimpse into the next generation of sparse models: modular, memory-augmented, and efficient by design. The future of LLMs might not be in scaling up, but scaling smarter. #AI #LLM #DeepLearning #Transformers #AIarchitecture #Sparsity #MachineLearning #NLP #MoE #Engram #DeepSeekAI #AIResearch #MemoryAugmentedModels #ThoughtLeadership

  • View profile for Laurence Moroney

    | Director of AI at arm | Award-winning AI Researcher | Best Selling Author | Strategy and Tactics | Fellow at the AI Fund | Advisor to many | Inspiring the world about AI | Contact me! |

    135,556 followers

    The future of AI isn't just about bigger models. It's about smarter, smaller, and more private ones. And a new paper from NVIDIA just threw a massive log on that fire. 🔥 For years, I've been championing the power of Small Language Models (SLMs). It’s a cornerstone of the work I led at Google, which resulted in the release of Gemma, and it’s a principle I’ve guided many companies on. The idea is simple but revolutionary: bring AI local. Why does this matter so much? 👉 Privacy by Design: When an AI model runs on your device, your data stays with you. No more sending sensitive information to the cloud. This is a game-changer for both personal and enterprise applications. 👉 Blazing Performance: Forget latency. On-device SLMs offer real-time responses, which are critical for creating seamless and responsive agentic AI systems. 👉 Effortless Fine-Tuning: SLMs can be rapidly and inexpensively adapted to specialized tasks. This agility means you can build highly effective, expert AI agents for specific needs instead of relying on a one-size-fits-all approach. NVIDIA's latest research, "Small Language Models are the Future of Agentic AI," validates this vision entirely. They argue that for the majority of tasks performed by AI agents—which are often repetitive and specialized—SLMs are not just sufficient, they are "inherently more suitable, and necessarily more economical." Link: https://lnkd.in/gVnuZHqG This isn't just a niche opinion anymore. With NVIDIA putting its weight behind this and even OpenAI releasing open-weight models like GPT-OSS, the trend is undeniable. The era of giant, centralized AI is making way for a more distributed, efficient, and private future. This is more than a technical shift; it's a strategic one. Companies that recognize this will have a massive competitive advantage. Want to understand how to leverage this for your business? ➡️ Follow me for more insights into the future of AI. ➡️ DM me to discuss how my advisory services can help you navigate this transition and build a powerful, private AI strategy. And if you want to get hands-on, stay tuned for my upcoming courses on building agentic AI using Gemma for local, private, and powerful agents! #AI #AgenticAI #SLM #Gemma #FutureOfAI

  • View profile for Weili Xu

    Senior Research Engineer | Team Lead

    1,890 followers

    I read a paper from NVIDIA Research last month that made a strong case for shifting from giant large language models (LLMs) to leaner, more specialized small language models (SLMs). I couldn’t agree more. https://lnkd.in/gbBNd_Bm Here are my top three takeaways: 1. Efficiency First – Models under 10B parameters consume fewer tokens, run faster, and cost significantly less to operate. Lower latency, reduced infrastructure demands, and greener AI. 2. Specialized Power – While large models excel at general conversation, small models shine in narrowly scoped tasks. Fine-tuning for a specific job can often match or exceed the performance of much larger models. 3. Better Fit for Agentic Systems – Most AI agents repeat structured, tool-based actions. SLMs are easier to fine-tune, deploy on-device, and integrate into modular multi-agent workflows, resulting in faster, cheaper, and more aligned systems. To test the theory, I built a specialized agent that generates a typical energy model based on building type and climate zone. I swapped between Qwen3:14B and Qwen3:4B on my local computer (M3, 18GB RAM). Running the same user query to generate results: Qwen3:14B – Input tokens: 3,052 | Output tokens: 2,070 | Duration: 164.24 s Qwen3:4B – Input tokens: 2,048 | Output tokens: 619 | Duration: 8.34 s That’s about 30% fewer tokens and 20× faster — achieving the same result. Sometimes, the future of AI is not about going bigger, but about going smaller, smarter, and faster. #AI #ArtificialIntelligence #MachineLearning #LLM #SLM #SmallLanguageModels #LargeLanguageModels #AgenticAI #MultiAgentSystems #EdgeAI #OnDeviceAI #NaturalLanguageProcessing #EnergyModeling #BuildingPerformance #EfficiencyInAI #TokenOptimization #ModelOptimization #AITesting #AIResearch

Explore categories