DSPy Trends in Large Language Model Development

Explore top LinkedIn content from expert professionals.

Summary

DSPy trends in large language model development highlight a shift from models that simply generate text responses to those designed for complex, multi-step reasoning and real-world task automation. DSPy refers to new data-centric, structured approaches in building language models that can better understand, plan, and operate within entire workflows, making them more useful for business, research, and daily applications.

  • Embrace agentic workflows: Explore how new language models are being built to not just answer questions, but also to handle multi-step tasks, use digital tools, and collaborate with other systems in real time.
  • Prioritize proprietary data: Ensure your AI solutions include your unique business data to differentiate their performance, as generic models are converging and losing their competitive edge.
  • Adopt structured representations: Consider models that process language at a concept or idea level rather than word-by-word, which leads to better understanding, more logical outputs, and greater reliability for complex tasks.
Summarized by AI based on LinkedIn member posts
  • View profile for Andrew Ng
    Andrew Ng Andrew Ng is an Influencer

    DeepLearning.AI, AI Fund and AI Aspire

    2,509,433 followers

    Large language models (LLMs) are typically optimized to answer peoples’ questions. But there is a trend toward models also being optimized to fit into agentic workflows. This will give a huge boost to agentic performance! Following ChatGPT’s breakaway success at answering questions, a lot of LLM development focused on providing a good consumer experience. So LLMs were tuned to answer questions (“Why did Shakespeare write Macbeth?”) or follow human-provided instructions (“Explain why Shakespeare wrote Macbeth”). A large fraction of the datasets for instruction tuning guide models to provide more helpful responses to human-written questions and instructions of the sort one might ask a consumer-facing LLM like those offered by the web interfaces of ChatGPT, Claude, or Gemini. But agentic workloads call on different behaviors. Rather than directly generating responses for consumers, AI software may use a model in part of an iterative workflow to reflect on its own output, use tools, write plans, and collaborate in a multi-agent setting. Major model makers are increasingly optimizing models to be used in AI agents as well. Take tool use (or function calling). If an LLM is asked about the current weather, it won’t be able to derive the information needed from its training data. Instead, it might generate a request for an API call to get that information. Even before GPT-4 natively supported function calls, application developers were already using LLMs to generate function calls, but by writing more complex prompts (such as variations of ReAct prompts) that tell the LLM what functions are available and then have the LLM generate a string that a separate software routine parses (perhaps with regular expressions) to figure out if it wants to call a function. Generating such calls became much more reliable after GPT-4 and then many other models natively supported function calling. Today, LLMs can decide to call functions to search for information for retrieval augmented generation (RAG), execute code, send emails, place orders online, and much more. Recently, Anthropic released a version of its model that is capable of computer use, using mouse-clicks and keystrokes to operate a computer (usually a virtual machine). I’ve enjoyed playing with the demo. While other teams have been prompting LLMs to use computers to build a new generation of RPA (robotic process automation) applications, native support for computer use by a major LLM provider is a great step forward. This will help many developers! [Reached length limit; full text: https://lnkd.in/gHmiM3Tx ]

  • View profile for Kuldeep Singh Sidhu

    Senior Data Scientist @ Walmart | BITS Pilani

    16,490 followers

    Exciting breakthrough in LLM Research: A comprehensive survey reveals that Large Language Models (LLMs) are proving to be highly effective embedding models, marking a significant shift from traditional encoder-only models like BERT to decoder-only architectures. The research, led by scholars from Beihang University, University of Technology Sydney, and other prestigious institutions, demonstrates two primary approaches for deriving embeddings from LLMs: >> Direct Prompting Strategy • Leverages LLMs' instruction-following capabilities to generate topic-specific embeddings • Utilizes contextual representations for enhanced semantic understanding • Implements prompt engineering techniques for optimal embedding generation >> Data-Centric Tuning Approach • Employs supervised contrastive learning with carefully curated datasets • Incorporates multi-task learning frameworks for improved generalization • Utilizes knowledge distillation from cross-encoder models for enhanced performance >> Advanced Implementation Details The research reveals sophisticated techniques including: • Bidirectional contextualization for enhanced semantic capture • Low-rank adaptation for efficient parameter tuning • Integration of both dense and sparse embedding approaches • Implementation of innovative pooling strategies for token aggregation >> Performance Insights The study demonstrates remarkable improvements over traditional models: • Superior performance in classification, clustering, and retrieval tasks • Enhanced capability in handling long-context dependencies • Improved cross-lingual representation capabilities • Better scalability with model size and training data This groundbreaking research opens new possibilities for applications in information retrieval, natural language processing, and recommendation systems.

  • 𝗧𝗟;𝗗𝗥 NeurIPS 2025 marks the definitive shift from "Chat" to "Autonomy." The research signals a split reality for the enterprise: generic models are converging into a commoditized "Artificial Hivemind," leaving proprietary data as your only real moat. However, the upside is massive. New "Gated Attention" architectures are redefining inference efficiency, while breakthroughs in 1,000-layer Deep RL are finally unlocking agents capable of navigating complex, long-horizon enterprise workflows without getting stuck. NeurIPS is around the corner and wanted to highlight some trends based on the best papers (https://lnkd.in/ejp6vEjD) 𝟯 𝗣𝗮𝗽𝗲𝗿𝘀 (𝗮𝗻𝗱 𝘁𝗵𝗲𝗺𝗲𝘀) 𝗬𝗼𝘂 𝗡𝗲𝗲𝗱 𝘁𝗼 𝗞𝗻𝗼𝘄 𝟭. 𝗧𝗵𝗲 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁𝗶𝗮𝘁𝗶𝗼𝗻 𝗖𝗿𝗶𝘀𝗶𝘀  • 𝗣𝗮𝗽𝗲𝗿: 𝗔𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝗛𝗶𝘃𝗲𝗺𝗶𝗻𝗱: The Open-Ended Homogeneity of Language Models  • 𝗧𝗵𝗲 𝗦𝗶𝗴𝗻𝗮𝗹: Models trained on synthetic data and each other’s outputs are suffering from "inter-model homogeneity." They are converging on the same "average" answers.  • 𝗧𝗵𝗲 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗥𝗲𝗮𝗹𝗶𝘁𝘆: If you rely on a vanilla wrapper around GPT, Claude and Gemini your business logic is becoming a commodity. 𝟮. 𝗧𝗵𝗲 𝗡𝗲𝘄 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗦𝘁𝗮𝗻𝗱𝗮𝗿𝗱  • 𝗣𝗮𝗽𝗲𝗿: Gated Attention for Large Language Models (Qwen Team)  • 𝗧𝗵𝗲 𝗦𝗶𝗴𝗻𝗮𝗹: By adding a simple "gate" to attention heads, we can stabilize training at massive scales and prevent "attention sinks."  • 𝗧𝗵𝗲 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗥𝗲𝗮𝗹𝗶𝘁𝘆: This is the update for your self-hosted inference. Models using Gated Attention (like Qwen3-Next) can offer significantly better performance-per-dollar. 𝟯. 𝗧𝗵𝗲 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗨𝗻𝗹𝗼𝗰𝗸 𝗣𝗮𝗽𝗲𝗿: 1000 Layer Networks for Self-Supervised RL 𝗧𝗵𝗲 𝗦𝗶𝗴𝗻𝗮𝗹: We used to think RL couldn't scale in depth like LLMs. This paper proves we can train 1,000-layer RL networks using self-supervised contrastive learning. 𝗧𝗵𝗲 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗥𝗲𝗮𝗹𝗶𝘁𝘆: This enables L5 Autonomous Agents - agents that can navigate complex ERP/CRM workflows without getting stuck in loops. 𝗔𝗰𝘁𝗶𝗼𝗻𝘀 𝗳𝗼𝗿 𝗖𝗧𝗢𝘀 𝗮𝗻𝗱 𝗖𝗔𝗜𝗢𝘀  𝟭. 𝗣𝗶𝘃𝗼𝘁 𝘁𝗼 "𝗗𝗮𝘁𝗮 𝗜𝗻𝗷𝗲𝗰𝘁𝗶𝗼𝗻": Go beyond prompt engineering with context and data engeineering. Focus even more on RAG and Fine-Tuning pipelines that inject your proprietary data to break the "Hivemind" average.  𝟮. 𝗔𝗱𝗼𝗽𝘁𝗶𝗻𝗴 𝗚𝗮𝘁𝗲𝗱 𝗠𝗼𝗱𝗲𝗹𝘀: When evaluating open-weights models for 2026, mandate "Gated Attention" architectures to lower your long-term inference TCO.  𝟯. 𝗣𝗶𝗹𝗼𝘁 𝗗𝗲𝗲𝗽 𝗥𝗟: Move your "Agent" pilots beyond simple tool use. Start testing self-supervised RL on internal workflows to build agents that learn from your experts' corrections.

  • View profile for Marty Weintraub

    Founder: AIMCLEAR® Brand Performance Marketing, AI Transformation, Speaker, Author, Photographer, Explorer

    10,276 followers

    The OpenRouter/16z 100 TRILLION token study is am empirical look at how people use large language models at scale. The findings challenge surface-level assumptions and point to a very different competitive landscape than most suggest. The biggest move is from single-turn interactions to agentic inference. Models are increasingly used as reasoning engines operating inside multi-step workflows. Prompts are getting several times longer, tool calls are rising, and reasoning-optimized models now handle the majority of tokens. The center of gravity has moved from producing text to driving processes. Another major theme is the strength of open source models, especially those coming out of China. Open-weight models now account for a sizable share of all usage, and Chinese labs have gone from almost zero to global contenders. The open ecosystem is now a competitive field where new releases can capture real mindshare almost immediately. One of the most striking findings is how people are actually using models. Programming and roleplay dominate total tokens. Coding assistance has become a core workload and primary driver of long-context reasoning, while roleplay is a structured, high-engagement use case that persists across regions and model types. hese two categories alone explain much of the real market behavior. No single model can be “best” at everything. Each provider shows a distinct usage fingerprint, from Claude’s heavy concentration in technical tasks to DeepSeek’s dominance in chat-driven and creative. The study also highlights a geographic reality that often gets overlooked. Usage is increasingly global. Asia’s share has risen dramatically. China, Singapore, and Germany sit near the top of total tokens. English still leads, but multilingual usage is becoming a competitive frontier. The section on user retention may be the most important for understanding long-term defensibility. Early cohorts for a few models show unusually strong retention because they were the first to solve a high-value workload. Once a model fits that need, users build systems and habits around it and are reluctant to switch. The researchers call this the Glass Slipper effect. It reframes retention as evidence of a capability breakthrough, not just a business metric. Cost dynamics tell a similar story. Cheap models absorb massive volume. Premium models command strong demand where correctness and reliability matter. Price alone does not predict usage. Differentiation still matters, and the market is segmented rather than commoditized. Taken together, the study describes an ecosystem moving toward multi-model reality, global competition, and workloads defined by reasoning depth and integration, not conversational polish. The practical implication is that the next era will reward builders who understand agentic workflows, invest in real workload fit, and treat usage data as the primary signal for where the market is heading. https://lnkd.in/gMY9dYA5

  • View profile for Brij Kishore Pandey
    Brij Kishore Pandey Brij Kishore Pandey is an Influencer

    AI Architect & AI Engineer | Building Agentic Systems & Scalable AI Solutions

    727,406 followers

    For the last couple of years, Large Language Models (LLMs) have dominated AI, driving advancements in text generation, search, and automation. But 2025 marks a shift—one that moves beyond token-based predictions to a deeper, more structured understanding of language.  Meta’s Large Concept Models (LCMs), launched in December 2024, redefine AI’s ability to reason, generate, and interact by focusing on concepts rather than individual words.  Unlike LLMs, which rely on token-by-token generation, LCMs operate at a higher abstraction level, processing entire sentences and ideas as unified concepts. This shift enables AI to grasp deeper meaning, maintain coherence over longer contexts, and produce more structured outputs.  Attached is a fantastic graphic created by Manthan Patel How LCMs Work:  🔹 Conceptual Processing – Instead of breaking sentences into discrete words, LCMs encode entire ideas, allowing for higher-level reasoning and contextual depth.  🔹 SONAR Embeddings – A breakthrough in representation learning, SONAR embeddings capture the essence of a sentence rather than just its words, making AI more context-aware and language-agnostic.  🔹 Diffusion Techniques – Borrowing from the success of generative diffusion models, LCMs stabilize text generation, reducing hallucinations and improving reliability.  🔹 Quantization Methods – By refining how AI processes variations in input, LCMs improve robustness and minimize errors from small perturbations in phrasing.  🔹 Multimodal Integration – Unlike traditional LLMs that primarily process text, LCMs seamlessly integrate text, speech, and other data types, enabling more intuitive, cross-lingual AI interactions.  Why LCMs Are a Paradigm Shift:  ✔️ Deeper Understanding: LCMs go beyond word prediction to grasp the underlying intent and meaning behind a sentence.  ✔️ More Structured Outputs: Instead of just generating fluent text, LCMs organize thoughts logically, making them more useful for technical documentation, legal analysis, and complex reports.  ✔️ Improved Reasoning & Coherence: LLMs often lose track of long-range dependencies in text. LCMs, by processing entire ideas, maintain context better across long conversations and documents.  ✔️ Cross-Domain Applications: From research and enterprise AI to multilingual customer interactions, LCMs unlock new possibilities where traditional LLMs struggle.  LCMs vs. LLMs: The Key Differences  🔹 LLMs predict text at the token level, often leading to word-by-word optimizations rather than holistic comprehension.  🔹 LCMs process entire concepts, allowing for abstract reasoning and structured thought representation.  🔹 LLMs may struggle with context loss in long texts, while LCMs excel in maintaining coherence across extended interactions.  🔹 LCMs are more resistant to adversarial input variations, making them more reliable in critical applications like legal tech, enterprise AI, and scientific research.  

  • View profile for Aishwarya Srinivasan
    Aishwarya Srinivasan Aishwarya Srinivasan is an Influencer
    633,655 followers

    For a long time, many companies built AI systems around a simple idea: choose the most powerful large language model available and use it across the entire workflow. One large model handling classification, summarization, routing, reasoning, and generation. What I am seeing now, especially going into 2026, is a clear architectural shift. Teams are moving away from the “one giant model does everything” approach. Instead, they are decomposing workflows and assigning different models to different layers of the system. Smaller, more specialized models are being used for well-defined tasks, while larger models are reserved for complex reasoning where their breadth actually matters. For those who are newer to this space, a SLMs typically refers to a model in the 1B to 12B parameter range. These models are optimized for efficiency, lower latency, and narrower domains. They are not designed to replace frontier-scale models, but to handle specific tasks extremely well. There are two practical reasons why I believe 2026 will be a high-adoption year for SLMs: ✦ Cheaper, faster, and more customizable For tasks like classification, structured extraction, lightweight reasoning, or domain-specific summarization, a smaller model is often more than sufficient. It runs with lower latency, costs less to scale, and if it is open source, it can be fine-tuned and adapted to your internal data and workflows. That level of customization gives teams real control over performance and differentiation. ✦ On-device and edge intelligence As more AI moves closer to the user, on-device and edge inference become critical. Mobile assistants, IoT systems, and privacy-sensitive enterprise applications cannot always rely on sending every request to a large cloud model. Small models make local inference feasible, improving both responsiveness and privacy. Large models are still essential for open-ended reasoning and complex generation. But the most mature systems will not rely on a single model. They will be orchestrated systems, where each model is chosen based on what it is best at. Model size is no longer the strategy, architecture is.

  • View profile for Dr. Brindha Jeyaraman

    Founder & CEO, Aethryx | Fractional Leader in Enterprise AI Engineering, Ops & Governance | Doctorate in Temporal Knowledge Graphs | Architecting Production-Grade AI | Ex-Google, MAS, A*STAR | Top 50 Asia Women in Tech

    19,150 followers

    One of the persistent challenges in using large language models (LLMs) is getting them to follow instructions reliably — especially when the instructions are subtle or domain-specific. DeepMind’s latest research introduces Symbol Tuning, a simple yet powerful fine-tuning method that significantly improves an LLM’s ability to follow symbolic prompts (e.g., bullet points, XML, Markdown, or code-like instructions) in zero-shot and few-shot settings. https://lnkd.in/gzKDdHQ2 Why this matters: 🔹 Improves instruction following in GPT-class models 🔹 Works with tiny amounts of data (just 100K tokens!) 🔹 Boosts performance in math, code, and reasoning-heavy tasks 🔹 Enhances models' ability to generalize across symbolic formats This has massive implications for building enterprise agents, RAG pipelines, and developer copilots that need high-precision, structured interaction with users or data. A great reminder: sometimes, small, well-targeted innovations create massive gains. #LLM #InContextLearning #SymbolTuning #PromptEngineering #DeepMind #GenAI #AIResearch #InstructionFollowing #EnterpriseAI #DeveloperTools

  • View profile for Mani Keerthi N

    Cybersecurity Strategist & Advisor || LinkedIn Learning Instructor

    17,694 followers

    On Protecting the Data Privacy of Large Language Models (LLMs): A Survey From the research paper: In this paper, we extensively investigate data privacy concerns within Large LLMs, specifically examining potential privacy threats from two folds: Privacy leakage and privacy attacks, and the pivotal technologies for privacy protection during various stages of LLM privacy inference, including federated learning, differential privacy, knowledge unlearning, and hardware-assisted privacy protection. Some key aspects from the paper: 1)Challenges: Given the intricate complexity involved in training LLMs, privacy protection research tends to dissect various phases of LLM development and deployment, including pre-training, prompt tuning, and inference 2) Future Directions: Protecting the privacy of LLMs throughout their creation process is paramount and requires a multifaceted approach. (i) Firstly, during data collection, minimizing the collection of sensitive information and obtaining informed consent from users are critical steps. Data should be anonymized or pseudonymized to mitigate re-identification risks. (ii) Secondly, in data preprocessing and model training, techniques such as federated learning, secure multiparty computation, and differential privacy can be employed to train LLMs on decentralized data sources while preserving individual privacy. (iii) Additionally, conducting privacy impact assessments and adversarial testing during model evaluation ensures potential privacy risks are identified and addressed before deployment. (iv)In the deployment phase, privacy-preserving APIs and access controls can limit access to LLMs, while transparency and accountability measures foster trust with users by providing insight into data handling practices. (v)Ongoing monitoring and maintenance, including continuous monitoring for privacy breaches and regular privacy audits, are essential to ensure compliance with privacy regulations and the effectiveness of privacy safeguards. By implementing these measures comprehensively throughout the LLM creation process, developers can mitigate privacy risks and build trust with users, thereby leveraging the capabilities of LLMs while safeguarding individual privacy. #privacy #llm #llmprivacy #mitigationstrategies #riskmanagement #artificialintelligence #ai #languagelearningmodels #security #risks

  • View profile for Vaibhava Lakshmi Ravideshik

    Research Lead @ Massachussetts Institute of Technology - Kellis Lab | LinkedIn Learning Instructor | Author - “Charting the Cosmos: AI’s expedition beyond Earth” | TSI Astronaut Candidate

    20,555 followers

    Like a fortress growing taller but keeping the same cracks, large language models may be expanding without becoming safer. A collaborative study between the UK AI Security Institute, Anthropic, University of Oxford, and the The Alan Turing Institute exposes this unsettling symmetry. The study demonstrates that data poisoning does not dilute with scale. Even as models and datasets grow by orders of magnitude, the absolute number of poisoned samples required to implant a backdoor remains roughly constant. In their experiments, 250 poisoned documents were sufficient to compromise models ranging from 600M to 13B parameters, despite the largest model being trained on nearly twenty times more clean data. This overturns the long-held belief that increasing data volume would naturally “average out” adversarial noise. Instead, larger models appear to be more sample-efficient learners, capable of internalizing both useful and malicious signals with equal precision. For those of us working on trust layers over model training - through Knowledge Graphs, ontology-driven provenance, and dynamic data vetting - this finding reinforces a critical point: robustness is not an emergent property of scale; it must be deliberately engineered. Key implications include: 1) Scaling laws for capability may mirror scaling laws for vulnerability. 2) Fine-tuning or alignment processes cannot reliably erase deeply embedded backdoors; they often only suppress them. 3) Graph-based reasoning layers may become essential for tracing data lineage and identifying subtle poisoning patterns before training. In the pursuit of larger and more capable models, the real challenge is ensuring that every data point shaping them remains interpretable, auditable, and trusted. Scaling safety will demand more than data volume - it will require transparency, traceability, and semantic intelligence across the entire data pipeline. Full length article: https://lnkd.in/gmMNdFgF #AISafety #DataPoisoning #ModelRobustness #BackdoorAttacks #AdversarialAI #AICybersecurity #LLMSecurity #AITrust #AIIntegrity #ResponsibleAI #ScalingLaws #FoundationModels #LargeLanguageModels #ModelAlignment #AIAlignment #ModelScaling #AIResearch #MachineLearningResearch #KnowledgeGraphs #OntologyEngineering #DataLineage #DataProvenance #TrustworthyAI #ExplainableAI #InterpretableAI #SemanticAI #AIEthics #AIGovernance #SafeAI #AITransparency #AIForGood #TechPolicy #DigitalTrust #FutureOfAI #AI #MachineLearning #DeepLearning #GenerativeAI #TechInnovation #EmergingTech

  • View profile for Sebastian Barros

    Managing director | Ex-Google | Ex-Ericsson | Founder | Author | Doctorate Candidate | Follow my weekly newsletter

    63,932 followers

    Large Language models Are Dead. The naive view that LLMs are just high-dimensional autoregressive token predictors is outdated. In modern AI architectures, they function as sequence-to-sequence orchestrators, dynamically routing inputs through retrieval-augmented generation (RAG), multimodal transformers, execution engines, and external APIs. Current LLM implementations aren’t pure self-contained autoregressive models; they are hybridized latent variable models that integrate non-parametric memory, symbolic reasoning components, and differentiable API calls. When you interact with a state-of-the-art LLM, you aren’t simply traversing a static token embedding space…You are engaging with an AI agent that performs external function invocation, executes code, queries vector databases, and resolves multi-hop reasoning chains. This shift mirrors what happened in traditional computing. Early CPUs handled everything, but modern systems offload tasks to GPUs, TPUs, FPGAs, and dedicated accelerators. AI is moving in the same direction. LLMs now act as the “kernel” of an AI operating system, managing function calls rather than solving everything end-to-end. OpenAI’s GPT-4 Turbo, Google’s Gemini, and Meta’s latest models are not just LLMs—they are multimodal intelligence stacks. This transition makes AI system design exponentially harder. Engineers must balance memory-constrained transformers, external retrieval systems, latency constraints in function calling, and multi-agent coordination. The real challenge is no longer just improving the scaling laws of transformers, but designing robust hierarchical AI architectures that combine parametric and non-parametric reasoning. The frontier of AI is shifting. The race is no longer about building bigger LLMs. It’s about designing AI architectures that orchestrate reasoning, memory, execution, and multimodal perception at scale. LLMs aren’t dead, but they just a function of massive AI systems.

Explore categories