I used to think that bigger is better in LLM engineering. I was wrong. Let me explain ⬇️ 𝗧𝗵𝗲 "𝗕𝗶𝗴𝗴𝗲𝗿 𝗶𝘀 𝗯𝗲𝘁𝘁𝗲𝗿" 𝗺𝗮𝗻𝘁𝗿𝗮 We (developers of the world) have been told so many times that “Bigger models are better” that we have internalised the mantra. No questions. Zero. Cero. Zéro Nula. ноль. 零 ज़ीरो We've been conditioned to believe "bigger models = better results." Every time we need to build an LLM-powered feature or product, we follow these steps: > go to OpenAI/DeepSeek/Anthropic/Gemini/whatever-LLM-provider-you-like > pick up the latest/larges/shiniest Large Language Model they offer > set up the billing (if it is your first time using that API) > generate a new API key, > uv install whatever Python SDK we need to talk to the API (e.g uv install anthropic), and… BOOM! We start streaming tokens from a data center the size of Luxembourg (and paying good cash for it) And the thing is, this approach works well for demos and small projects, but does NOT work when you are building real-world apps and features that need to be used by real people, because > Your API bills explode with user growth > Privacy regulations block third-party model usage > Latency kills user experience > Offline scenarios become impossible 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆 𝗰𝗵𝗲𝗰𝗸 Most AI problems don't need GPT-4 level power. It's like taking a plane to go grocery shopping. Overkill. 𝗧𝗵𝗲 𝘀𝗺𝗮𝗿𝘁𝗲𝗿 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵? Small Language Models. I just proved this by building a chess app with a tiny 350M parameter model by Liquid AI that: ✅ Runs 100% on mobile devices ✅ Works offline anywhere ✅ Zero ongoing cloud costs ✅ Complete data privacy (You can find a link to the code on the comment section below this post) 𝗧𝗵𝗲 𝗻𝗲𝘄 𝗔𝗜 𝗺𝗮𝗻𝘁𝗿𝗮 Smaller, task-specific models are often better because they're cheaper, faster, private, and more accurate for your actual use case. While everyone's chasing the biggest models, smart developers are building with Small Language Models that solve real problems without breaking the bank. The edge is where (most of day-to-day) AI's future lives. What would you build if you could run LLMs natively on mobile? Please let me know on the comments 👇 ---- Hi there, I am Pau! Glad you got to the end of the post (who reads these days?!) I share tips and tricks about building AI products that work. If you want to get more of these posts and have a higher signal-to-noise ratio in your feed, 𝗴𝗶𝘃𝗲 𝗺𝗲 𝗮 𝗳𝗼𝗹𝗹𝗼𝘄 Pau Labarta Bajo!
How Smaller AI Models Support Startup Growth
Explore top LinkedIn content from expert professionals.
Summary
Smaller AI models, often called Small Language Models (SLMs), are compact versions of artificial intelligence systems trained for specific tasks. These models support startup growth by offering accessible, affordable, and practical solutions that fit real business needs without requiring vast resources.
- Reduce expenses: Using smaller AI models helps startups minimize computing and cloud costs, making advanced technology available even with limited budgets.
- Boost privacy: Smaller models can run locally, keeping sensitive user and company data more secure by avoiding reliance on third-party providers.
- Increase adaptability: Task-focused AI models let startups quickly tailor solutions to new challenges, allowing them to respond faster to market changes.
-
-
A new study from Amazon Web Services (AWS) challenges conventional wisdom about AI model scaling. Researchers fine-tuned a 350M parameter model that achieved a 77.55% success rate on complex tool-calling tasks, significantly outperforming larger models like ChatGPT (26%) and Claude (2.73%), which have 20-500 times more parameters. This finding highlights that a model with 350 million parameters can outperform a 175 billion parameter model by nearly three times. The implications for enterprise AI adoption are significant. For the past two years, the narrative has been that bigger is always better, requiring massive compute budgets and infrastructure investments for capable AI agents. This research contradicts that notion. The key difference lies in targeted fine-tuning on specific tasks rather than general-purpose training. The smaller model focused its capacity on learning tool-calling behaviors, achieving remarkable parameter efficiency where larger models often become less effective. Most organizations do not need AI that can perform every task; they require AI that excels in their specific workflows. The cost difference between operating a 350M model and a 175B model is transformational, making AI accessible to any organization with a clear use case rather than just tech giants. In my interaction with leaders, I observe that organizations are not struggling with AI capability but with AI economics and governance. The future isn't solely about larger models; it's about smarter deployment of appropriately sized models for specific enterprise contexts. The future of enterprise AI focuses on making sophisticated capabilities accessible, affordable, and deployable at scale. What specialized AI applications could transform your organization if cost and complexity weren't barriers? #AI #EnterpriseAI #MachineLearning #AIGovernance #Innovation
-
Don’t scale up. Scale down. That’s how we shift from scale as size to scale as fit. Alex Zaretsky, PhD, MBA, asked me a question at the centre of this transition: How do we move C-suites from “let’s deploy agents wherever possible” to “let’s think about the use case, data, architecture, and process first”? It starts with seeing AI not as a thin layer to spread, but as a system to fit, aligning ambition with architecture. JPMorgan’s research shows why. Their team compared fine-tuned LLMs with small proprietary transformers trained from scratch on transaction data. The smaller models achieved similar accuracy, improved coverage by fourteen percent, cut latency from 735 ms to ninety-five, and saved roughly thirteen million dollars a year. Each had about 1.7 million parameters, a thousandth the size of the LLMs they replaced. The gain came not from more capability, but from tighter alignment between task, data, and latency. Where the domain is structured and the goal clear, precision beats magnitude. NVIDIA’s work on agentic systems reinforces the pattern. Most enterprise agents use only a narrow slice of what large models can do. They parse, plan, call APIs, and return structured outputs. The future lies in networks of small, specialised models that coordinate precisely, scaling horizontally through composition, not vertically through size. Scaling down doesn’t mean shrinking ambition; it means distributing intelligence where it fits best. Pinterest proved that from another angle. Instead of scaling the model, they re-shaped the input. Their Visual Page Representation - a compact map of what a human actually sees - let simple XGBoost classifiers outperform GPT-4o at a fraction of the cost. They didn’t scale the model; they scaled the abstraction. Not every domain lends itself to that precision, but where data and purpose are stable, the results are clear. AWS’s Executive’s Guide to Agentic AI codifies this philosophy. It urges leadership teams - all of us learning together - to start small, anchor on outcomes, treat agents as workflow actors, and build governance into the rails from day one. It’s pragmatic progress, the kind that compounds over time. McKinsey and MIT show what happens when that discipline is missing. Ninety-five percent of enterprise AI pilots still deliver no measurable ROI. The successful few begin with a defined workflow, a stable data interface, and a clear line between human and system responsibility. This isn’t theory; it’s a playbook already visible in finance, commerce, and infrastructure. The direction of travel is undeniable. The next era of AI scale will come not from bigger models, but from better alignment, between use case and capability, data and representation, architecture and process. The goal is to build muscle memory into the system, feedback loops, guardrails, and governance that make precision repeatable until fit becomes instinct. And that’s when scale truly begins.
-
In 2024–2025, the AI race was simple: bigger models meant better results. In 2026, that thinking is changing fast. Enter Small Language Models (SLMs) - lightweight, task-focused models that deliver faster responses, lower costs, stronger privacy, and more predictable production behavior. Instead of sending every request to massive cloud LLMs, enterprises now use smaller models for everyday tasks like classification, extraction, summarization, routing, and drafting — while reserving large models only for complex reasoning and creative workloads. This shift is driven by real-world constraints. SLMs run locally on laptops, edge devices, or low-cost servers, making them ideal for latency-sensitive and privacy-critical applications. They’re optimized for speed, cost efficiency, on-device privacy, and task specialization - exactly what production systems need today. What’s surprising in 2026 is how capable these models have become. Modern SLM families can summarize documents, answer questions accurately, generate meaningful content, and handle reasoning-style tasks - all while running locally. In simple terms: yesterday’s enterprise AI now fits on your laptop. Architecturally, teams are moving to a small-first, big-when-needed approach. SLMs handle most operational workloads like extraction, classification, summarization, and routing. Larger models step in only for deep reasoning, long conversations, or creative synthesis. Around this, companies build local AI stacks with runtimes, vector databases for RAG, embeddings, tool calling, guardrails, and monitoring - turning SLMs into full internal AI platforms, not just models. The takeaway is simple: 2024–2025 was about model size. 2026 is about efficiency. Small Language Models aren’t a trend. They’re becoming the default for production AI because modern systems care about usability, scalability, affordability, and security more than raw parameter counts. If you’re building AI for real-world use, SLMs should already be on your architecture diagram. Save this for later and share it with your platform or AI team.
-
𝐋𝐚𝐫𝐠𝐞 𝐨𝐫 𝐒𝐦𝐚𝐥𝐥: 𝐖𝐡𝐢𝐜𝐡 𝐢𝐬 𝐁𝐞𝐭𝐭𝐞𝐫? Get your minds out of the gutter, in this instance I am asking in regards to the models powering AI. For years, the AI industry has been obsessed with scale. Large Language Models (LLMs) like GPT-4 have dominated the conversation, boasting massive capabilities driven by billions of parameters. But here’s the question we should be asking: Is bigger always better? It’s time to think smarter, not just bigger. Enter Small Language Models (SLMs) and modular neural architectures—a strategy that focuses on building specialized, efficient, and collaborative AI systems rather than relying on one massive model to do it all. Here’s why this approach makes more sense: 1. ⚡ Efficiency Over Scale: LLMs are resource-intensive, demanding enormous computational power. SLMs are lightweight, faster, and more adaptable—ideal for edge devices, cloud systems, and even embedded applications. 2. 🎯 Specialization Beats Generalization: LLMs are generalists, but SLMs can be hyper-specialized. Imagine a system where one model handles cybersecurity, another manages predictive maintenance, and another focuses on language analysis. Each model excels in its niche, creating a collective intelligence more powerful than any single model. 3. 🔗 Scalable Modularity: Building AI systems with multiple SLMs is like assembling a high-performance team—each model contributes its unique expertise. This modular design is scalable, adaptable, and resilient, much like how different regions of the brain work together. 4. 💸 Cost-Effective Innovation: Training and maintaining LLMs comes with a massive price tag. SLMs reduce costs and make innovation accessible to startups, research teams, and solo developers. AI development becomes more democratic and agile. 5. 🔒 Enhanced Security & Privacy: SLMs can process sensitive data locally on edge devices, reducing reliance on centralized systems and lowering security risks. Smaller models can mean safer data. This shift from building colossal models to designing purpose-driven, collaborative systems could be the key to more sustainable and intelligent AI. The future of AI isn’t about one model knowing everything. It’s about many specialized models working together to solve complex problems. Sometimes, smaller is smarter. And smarter is what we need. #slm #ai #future
-
The AI arms race is missing the real opportunity: While everyone talks about billion-parameter AI models, the real revolution might be happening with smaller, specialized models. Here's why: 𝟭. 𝗠𝗼𝗿𝗲 𝗮𝗰𝗰𝗲𝘀𝘀𝗶𝗯𝗹𝗲 Smaller models can be run by more companies on more attainable hardware 𝟮. 𝗠𝗼𝗿𝗲 𝗰𝗼𝘀𝘁-𝗲𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 Training and running smaller models costs a fraction of massive models 𝟯. 𝗠𝗼𝗿𝗲 𝗲𝘅𝗽𝗹𝗮𝗶𝗻𝗮𝗯𝗹𝗲 The larger the model, the harder it is to understand how decisions are made 𝟰. 𝗠𝗼𝗿𝗲 𝘁𝗮𝗿𝗴𝗲𝘁𝗲𝗱 Domain-specific models excel at industry-specific language and contexts As GPU shortages continue and cloud costs increase, the companies gaining competitive advantage aren't chasing the biggest models. They're building smarter, targeted AI systems that deliver specific business value from their unstructured data. The AI revolution isn't just about size. It's about relevance.
-
🚨 The AI industry is pouring $ 𝗯𝗶𝗹𝗹𝗶𝗼𝗻𝘀 into massive LLMs for agents. NVIDIA says: 𝘆𝗼𝘂 𝗱𝗼𝗻’𝘁 𝗻𝗲𝗲𝗱 𝘁𝗵𝗲𝗺. Small Language Models (SLMs) are the real future. ⚡ Why it matters ✸ Most agents route every task—no matter how trivial—through GPT-4 or Claude → That’s expensive, slow, and overkill. ✸ NVIDIA’s new paper argues→ SLMs models small enough to run on consumer hardware are faster, cheaper, and just as capable for most agentic tasks. ⚡ Practical recommendations ✸ Prioritize SLMs for cost-effective deployment → Cut latency, energy use, and infra costs, especially for real-time or on-device inference. ✸ Design modular agentic systems → Use SLMs for routine, narrow tasks; reserve LLMs only for complex reasoning. ✸ Leverage SLMs for rapid specialization → Fine-tune small models quickly for evolving use cases, enabling faster iteration and easier adaptation. ⚡ How it works Agentic workloads are rarely open-ended conversations. They’re structured, repetitive, and scoped: ✸ Summarize a doc ✸ Extract a field ✸ Call an API ✸ Generate boilerplate code For these, you don’t need a trillion-parameter giant. A well-tuned 2–7B model often does the job just as well—or better. ⚡ Proof in numbers ✸ Phi-3 Small (7B) rivals 70B on reasoning + code. ✸ DeepSeek-R1-Distill (7B) beats Claude 3.5 + GPT-4o on reasoning. ✸ SmolLM2 (≤1.7B) matches 70B models from 2 years ago. ✸ Nemotron-H (2–9B) delivers 30B-level tool use at a fraction of compute. ✸ Toolformer (6.7B) outperforms GPT-3 (175B) by learning APIs. Smaller ≠ weaker. ⚡ The efficiency edge ✸ 10–30× cheaper to run ✸ Lower energy use + faster response ✸ Overnight fine-tuning with LoRA/QLoRA ✸ More reliable with strict formats (JSON, XML, Python) ✸ Deployable locally for cost savings + data control ⚡ Smarter architectures Don’t build monolithic agents. ✸ Use SLMs by default ✸ Call LLMs only when absolutely necessary This modular approach is cheaper, more controllable, and easier to debug. ⚡ Migration path (LLM → SLM) ✸ Log usage data ✸ Cluster recurring tasks ✸ Fine-tune SLMs ✸ Replace LLM calls ✸ Iterate + refine ⚡ Real-world results ✸ MetaGPT → 60% of LLM calls replaceable ✸ Cradle → 70% ✸ Open Operator → 40% ⚡ Why it hasn’t flipped yet ✸ Billions sunk into LLM infra ✸ Benchmarks biased toward generalist models ✸ Less hype for SLMs But none of these are technical blockers. ⚡ The Takeaway ✸ The future of agents isn’t bigger models. ✸ It’s smarter architectures built on SLMs. ✸ SLMs give you speed, control, and affordability. ≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣≣ ⫸ꆛ Want to Build Real-World AI Agents? Join My 𝗛𝗮𝗻𝗱𝘀-𝗼𝗻 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝟱-𝗶𝗻-𝟭 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 — Now with 𝗠𝗖𝗣! ➠ Build Real-World Agents for Healthcare, Finance, Smart Cities ➠ Master 5 Frameworks: MCP · LangGraph · PydanticAI · CrewAI · OpenAI Swarm ➠ Includes 9 Real-World Projects · All Code Included 👉 𝗘𝗻𝗿𝗼𝗹𝗹 𝗡𝗢𝗪 (𝟱𝟬%+ 𝗢𝗙𝗙): https://lnkd.in/eGuWr4CH
-
Small language models (SLMs) are redefining the way businesses approach generative AI. Unlike their larger counterparts, SLMs are built for precision and efficiency, excelling in specialized tasks where massive computational power isn’t necessary. From powering chatbots and summarizing documents to creating tailored marketing content, these compact models deliver focused results without the need for extensive hardware or high costs. What makes SLMs stand out? Their accessibility and adaptability. Many are open-source, enabling developers and businesses of all sizes to implement advanced AI solutions on local systems. By combining lightweight design with innovative techniques like model distillation, SLMs achieve impressive performance in areas like coding assistance, embedded systems, and mobile apps. Plus, with local deployment, they offer enhanced privacy and security—a major advantage over cloud-reliant models. As the demand for tailored AI solutions grows, SLMs like Microsoft for Startups’s PHI-3 and Metas Solutions Llama 3 are paving the way. These models demonstrate that smaller-scale AI can deliver impactful results, making them an essential tool for businesses seeking cost-effective, resource-efficient solutions. Explore the future of AI—where smaller really is smarter.
-
The frenzy around the new open-source reasoning #LLM, DeepSeek-R1, continued today, and it’s no wonder. With model costs expected to come in 90-95% lower than OpenAI o1, the news has reverberated across the industry from infrastructure players to hyperscalers and sent stocks dropping. Amid the swirl of opinions and conjecture, I put together a brief synopsis of the news – just the brass tacks – to try and simplify the implications and potential disruptions and why they matter to leaders. 1. 𝗦𝗸𝗶𝗽𝗽𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝘂𝗹𝗲𝘀: DeepSeek-R1-Zero ditched supervised fine-tuning and relied solely on reinforcement learning—resulting in groundbreaking reasoning capabilities but less polished text. 2. 𝗧𝗵𝗲 𝗣𝗼𝘄𝗲𝗿 𝗼𝗳 𝗤𝘂𝗮𝗹𝗶𝘁𝘆 𝗗𝗮𝘁𝗮: Even a tiny set of curated examples significantly boosted the model's readability and consistency. 3. 𝗦𝗺𝗮𝗹𝗹 𝗕𝘂𝘁 𝗠𝗶𝗴𝗵𝘁𝘆 𝗠𝗼𝗱𝗲𝗹𝘀: Distilled smaller models (1.5B–70B parameters) outperformed much larger ones like GPT-4o, proving size isn’t everything. Why does this matter to business leaders? • 𝗚𝗮𝗺𝗲-𝗖𝗵𝗮𝗻𝗴𝗲𝗿 𝗳𝗼𝗿 𝗔𝗜 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 𝗖𝗼𝘀𝘁𝘀: Skipping supervised fine-tuning and leveraging reinforcement learning could reduce costs while improving reasoning power in AI models. • 𝗛𝗶𝗴𝗵-𝗤𝘂𝗮𝗹𝗶𝘁𝘆 𝗗𝗮𝘁𝗮 𝗶𝘀 𝗮 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗰 𝗔𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲: Investing in carefully curated data (even in small quantities) can lead to a competitive edge for AI systems. • 𝗦𝗺𝗮𝗹𝗹𝗲𝗿 𝗦𝗺𝗮𝗿𝘁𝗲𝗿 𝗠𝗼𝗱𝗲𝗹𝘀 𝗦𝗮𝘃𝗲 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀: Smaller, distilled models that perform better than larger ones can drive efficiency, cutting costs on infrastructure while maintaining high performance. Let me know if you agree… And if you're curious, the DeepSeek-R1 paper is a must-read. https://lnkd.in/eYPidAzg #AI #artificialintelligence #OpenAI #Hitachi
-
Where in your organization could a “smaller” model deliver a smarter solution? ⚡Big Isn’t Always Better in AI Right now, most companies default to Large Language Models (#LLMs) for every task. But here’s the catch: 🔹You don’t need a supercomputer to summarize a report. 🔹You don’t need a billion-parameter model to check a workflow. 🔹Enter Small Language Models (#SLMs) — leaner, faster, and often good enough for many enterprise tasks. Think of it this way: 🔹Use an LLM like a Swiss Army knife—for broad, open-ended reasoning. 🔹Use an SLM like a scalpel—for precise, repeatable, well-defined tasks. 💡 Why this matters: Lower latency & costs (Tokens and Platform processing costs); Energy-efficient; Easier to fine-tune for specific teams; A modular, mix-and-match architecture that scales sustainably; The shift from “LLM everywhere” → “SLM-first, LLM when needed” could be one of the smartest moves in #enterpriseAI strategy. The question for leaders: This is analogous to custom datamarts and views in the Data Warehouse. Are we using the right-sized model for the job?