Day 19/30 of SLMs/LLMs: Mixture-of-Experts, Efficient Transformers, and Sparse Models As language models grow larger, two challenges dominate: cost and efficiency. Bigger models bring higher accuracy but also higher latency, energy use, and deployment complexity. The next phase of progress is about making models faster, lighter, and more intelligent per parameter. A leading direction is the Mixture-of-Experts (MoE) architecture. Instead of activating every parameter for each input, MoE models route tokens through a few specialized “experts.” Google’s Switch Transformer and DeepMind’s GLaM demonstrated that activating only 5 to 10 percent of weights can achieve the same accuracy as dense models at a fraction of the compute. Open models like Mixtral 8x7B extend this idea by using eight experts per layer but activating only two for each forward pass. The result is performance similar to a 70B model while operating at roughly 12B compute cost. Another active area of innovation is Efficient Transformers. Traditional attention scales quadratically with sequence length, which limits how much context a model can process. New variants such as FlashAttention, Longformer, Performer, and Mamba improve memory efficiency and speed. FlashAttention in particular accelerates attention calculations by performing them directly in GPU memory, achieving two to four times faster throughput on long sequences. Sparse Models also contribute to efficiency by reducing the number of active parameters during training or inference. Structured sparsity, combined with quantization and pruning, allows models to run on smaller devices without a major loss in quality. Advances in sparsity-aware optimizers now make it possible to deploy billion-parameter models on standard hardware with near state-of-the-art accuracy. These techniques share a single goal: scaling intelligence without scaling cost. The focus is shifting from building larger networks to building smarter ones. A 7B model that uses retrieval, sparse activation, and efficient attention can outperform a much larger dense model in both speed and reliability.
Trends in Open-Source Language Models
Explore top LinkedIn content from expert professionals.
Summary
Open-source language models are artificial intelligence systems whose design and training data are made freely available for anyone to use, modify, or improve. Recent trends show these models are becoming more efficient, cost-friendly, and competitive with proprietary alternatives, enabling wider access and rapid innovation across industries.
- Embrace smarter efficiency: Explore new techniques like Mixture-of-Experts and efficient transformers to run advanced AI on less hardware and energy, making powerful language models more accessible to smaller teams and organizations.
- Prioritize fit-for-purpose: Choose or customize open models that suit your actual business needs, as smaller, specialized models can outperform generic AI solutions at a fraction of the cost.
- Focus on user experience: Shift attention from just model accuracy to building helpful, easy-to-use AI tools, since most users value simplicity and value over cutting-edge performance.
-
-
The AI landscape is undergoing a fundamental shift, and it’s not the one you think. The competitive frontier isn’t only about building the largest proprietary models. There are two other major trends emerging that haven’t had enough discussion: Open source models have made tremendous strides, especially on cost relative to performance. Compact, fit-for-purpose models can meaningfully out-perform general purpose LLMs on specific tasks and do so at dramatically lower cost. Our Chief Technology Officer and AI team share how we are using open source AI models at Pinterest to achieve similar performance at less than 10% of the cost of leading, proprietary AI models. They also share how Pinterest has built in-house, fit-for-purpose models that are able to significantly outperform leading, proprietary general purpose models. The race to build the largest, most powerful models is profound and meaningful. If you want to see a thriving ecosystem of innovation in an AI-driven world, you should also want to see a thriving open source AI community that creates democratization and transparency. It’s a good thing for us all that open source is in the race. For our part, we’ll continue to share our findings in leveraging open source AI so that more companies and builders can benefit from the democratizing effect of open source AI. https://lnkd.in/gmT6UNXs
-
This is the best and most current view of the market for LLMs, laid out as a series of succinct data visualizations on what matters. Well worth flipping through. It is accompanied by a paper that dives deeper into market evolution. Key insights include: 📈 Open-source supply is exploding but trails in adoption despite far lower cost. By late 2025, the cumulative number of open-source models approached 400, significantly outpacing the ~200 closed-source models available. Despite this variety and being roughly 90% cheaper than comparable proprietary models, open-source options account for less than 30% of the market share on average. 📉 The price of intelligence has crashed. The cost to access high-level intelligence plummeted between 2023 and 2025, with prompt prices for top-tier models falling from $30.00 per million tokens (GPT-4) to under $0.10 for capable newer models. When adjusting for the quality of intelligence provided, the relative price of tokens dropped by a factor of nearly 100 over this two-year period. 🧠 Consumers trade peak intelligence for value. Although the "frontier" of intelligence continues to rise with models like GPT-5 and Gemini 3 Pro, the average intelligence actually consumed by users consistently trails the highest available capability. Users effectively trade off maximum smarts for lower costs. 🛠️ Different industries demand different levels of "smarts." Demand is highly segmented by use case, with users requiring significantly higher intelligence levels for programming (index 0.51) and science (0.47) compared to tasks like translation (0.32) or trivia (0.37). Programming and technology tasks command higher premiums than creative writing or SEO. 🏆 Closed-source models retain dominance despite competition. Despite the proliferation of free-entry open-source providers, closed-source models continued to command the vast majority of token market share throughout 2025. However, the leaderboard is volatile; strictly closed-source incumbents face sudden disruption, such as when xAI surged from negligible usage to a significant market share within months in late 2025. 💻 Technical workflows exhibit extreme model turnover. In specialized applications like coding assistants, users abandon incumbent models rapidly; for instance, the "Cline" app saw usage shift from "Claude 4 Sonnet" in mid-2025 to being dominated by "Grok Code Fast 1" by October 2025. This sector demonstrates a "winner-takes-all" dynamic where a new, faster, or more specialized model can capture the majority of traffic in a remarkably short window. ⚓ Corporate demand is sticky and inelastic. Despite the rapid release of cheaper and faster models, business demand remains relatively inelastic in the short run, meaning firms do not immediately switch providers in response to price drops.
-
Have Open Source LLMs finally caught up with Private LLMs? Historically, Open Source Performance lagged Private Models by 6-12 months. However, Llama 3.1 has narrowed the gap even further. 🚀 As Foundation Models evolve, the gap between open-source models like Llama 3.1 and closed-source models like ChatGPT-4 is shrinking. Chamath Palihapitiya said it best: "Foundation models will have no economic value; they will be broadly available and entirely free." 🌐 Here’s what I believe: The real moat in AI will shift away from proprietary models. Why? Open-source models like Mistral and Llama, trained on open data, are eroding the dominance of proprietary models. The focus is moving towards usability.🔍 True defensibility in AI will come from: - User experience - Brand - Rich feature sets Most users prioritize ease of use and value over model accuracy and benchmarks. Companies are shifting to enable "Outcomes" for end users, paving the way for "Services as a Software." 📈 That's why: - Anthropic hired Instagram's co-founder Mike Krieger as their CPO to enhance user experience. Take Midjourney: It leads in Image Model-as-a-Service due to its image quality, not just model performance. Even Apple is connecting with other LLMs instead of building their own, integrating these features seamlessly into their products. The landscape is changing. How will your strategy adapt? 💡 #AI #OpenSource #MachineLearning #UserExperience #TechInnovation #AITrends #FoundationModels #BusinessStrategy #DigitalTransformation #FutureOfTech
-
A lot has changed since my #LLM inference article last January—it’s hard to believe a year has passed! The AI industry has pivoted from focusing solely on scaling model sizes to enhancing reasoning abilities during inference. This shift is driven by the recognition that simply increasing model parameters yields diminishing returns and that improving inference capabilities can lead to more efficient and intelligent AI systems. OpenAI's o1 and Google's Gemini 2.0 are examples of models that employ #InferenceTimeCompute. Some techniques include best-of-N sampling, which generates multiple outputs and selects the best one; iterative refinement, which allows the model to improve its initial answers; and speculative decoding. Self-verification lets the model check its own output, while adaptive inference-time computation dynamically allocates extra #GPU resources for challenging prompts. These methods represent a significant step toward more reasoning-driven inference. Another exciting trend is #AgenticWorkflows, where an AI agent, a SW program running on an inference server, breaks the queried task into multiple small tasks without requiring complex user prompts (prompt engineering may see end of life this year!). It then autonomously plans, executes, and monitors these tasks. In this process, it may run inference multiple times on the model while maintaining context across the runs. #TestTimeTraining takes things further by adapting models on the fly. This technique fine-tunes the model for new inputs, enhancing its performance. These advancements can complement each other. For example, an AI system may use agentic workflow to break down a task, apply inference-time computing to generate high-quality outputs at each step and employ test-time training to learn unexpected challenges. The result? Systems that are faster, smarter, and more adaptable. What does this mean for inference hardware and networking gear? Previously, most open-source models barely needed one GPU server, and inference was often done in front-end networks or by reusing the training networks. However, as the computational complexity of inference increases, more focus will be on building scale-up systems with hundreds of tightly interconnected GPUs or accelerators for inference flows. While Nvidia GPUs continue to dominate, other accelerators, especially from hyperscalers, would likely gain traction. Networking remains a critical piece of the puzzle. Can #Ethernet, with enhancements like compressed headers, link retries, and reduced latencies, rise to meet the demands of these scale-up systems? Or will we see a fragmented ecosystem of switches for non-Nvdia scale-up systems? My bet is on Ethernet. Its ubiquity makes it a strong contender for the job... Reflecting on the past year, it’s clear that AI progress isn’t just about making things bigger but smarter. The future looks more exciting as we rethink models, hardware, and networking. Here’s to what the 2025 will bring!
-
Open Source AI Revolution: Groundbreaking Models in July ✨ Open Source AI is on fire! This month saw massive breakthroughs that are shaking up the tech world. Let's dive into some of the major developments that are reshaping the AI landscape recently: Meta Llama 3.1 (405B, 70B & 8B): The newest Llama models support 8 languages, have a 128K context, and come with a very permissive commercial license. The 405B model even outperforms GPT-4! Meta also shared a detailed tech report on upcoming multi-modal features (image, audio, video). Nvidia’s Minitron Distilled (4B & 8B): These models, with a 256K vocabulary and Apache 2.0 license, use iterative pruning and distillation for top performance. Meta Segment Anything Model 2 (SAM 2): Now available under Apache 2.0, SAM 2 improves real-time object segmentation in videos and images with better accuracy and efficiency. It also supports zero-shot generalization. Salesforce’s MINT-1T: An open-source dataset with one trillion text tokens and 3.4 billion images. Mistral's Large 123B: A dense, multilingual model (12 languages) with a 128K context. It is significantly more capable in code generation, mathematics, and reasoning. It also provides a much stronger multilingual support, and advanced function calling capabilities. It outperforms Meta Llama 3.1's 70B model. InternLM's Step Prover 7B: This model achieves state-of-the-art results on Lean, trained on large-scale formal data from GitHub. Open-Sora: A project aimed at democratizing high-quality video production. By making the model, tools, and details accessible, Open-Sora fosters innovation and inclusivity in content creation. What excites you most about these developments? How do you think open-source AI will impact your industry?
-
3 emerging trends in the LLM space. It has been almost 2 years since ChatGPT emerged and the world was introduced to LLMs. Since then they have grown leaps and bounds. Initially, these models could only process text, but now they have multi-modal capabilities, which means they can also process voice, image and video. Then came the flood of “open-sourced” LLMs. As I write this, there are 3000+ open-sourced LLMs out there. We can use some of these models for free, even for commercial purposes. Then, LLMs got reasoning abilities, started interacting with external software (like Google, Python REPL, Weather app, etc.), and started self-reflecting. This gave birth to AI Agents. Now, what’s next? Where is LLM research heading? What are AI researchers doing in the LLM field? There are 3 key emerging trends in the LLM space 1. Domain specific LLMs Currently, LLMs are generic. They possess lots of knowledge about various fields. So, the next logical step is to create LLMs that are experts or “super specialized” in a particular field, like finance, biology, coding, etc. Domain specific LLMs could further increase the accuracy and reduce hallucination. 2. Non-Transformer LLMs Currently, most LLMs are “transformer” based. We all know GPT in chatGPT stands for Generative Pre-trained Transformer. But the question is, can we deviate from standard transformer architecture and solve the pain points of transformers to further improve accuracy? So this is also an active area of research nowadays. 3. Smaller (yet more efficient LLMs) LLMs require a lot of computational power and hence more money. To make them more efficient, we could increase the parameters but that would further increase the cost. To give you an idea it costs somewhere around $ 5-10 million to train models like LLama2 and GPT3. Further increasing the cost is just not sustainable and beyond the reach of many small companies and researchers. And thats the reason, we need smaller LLMs with reduced parameters which are equally (or more) effective. Which of these trends you are excited about? #LargeLanguageModels
-
If you think OpenAI and Microsoft 'own' the generative AI market, think again. While there's been much public attention given to the big 'general-purpose AI' models, it looks like there are two emerging trends in next-generation models. 1️⃣ Domain tailored: models are specialising in narrower horizontal domains (maths, coding etc), rather than scaling the earlier *bigger! bigger!* one-size-fits all approach that took us from GPT-3.5 to 4. Which brings me to: 2️⃣ Less is more: smaller models that can run on retail hardware (phones, IoT devices etc), are likely to be more useful in domain-specific applications and 'edge-AI' systems out in field and factory. Last week saw the release of several impressive models from big tech and startups alike: 🟢Apple Intelligence just released two new open source research models via HuggingFace: https://shorturl.at/ig3Ci 🟢AI hardware startup Groq has put out two high-performing Llama tools: https://shorturl.at/t9phf 🟢OpenAI has unveiled GPT-4mini, a light-weight, cheaper-running version of its current flagship model: https://shorturl.at/TLBT2 🟢Mistral AI + NVIDIA have released Mistral-NeMo12B, an enterprise language model: https://shorturl.at/ZKoKe It's not slowing down any time soon.