Chain-of-Thought has been a fundamental architecture driving LLM performance. Now 'Chain of Continuous Thought' (Coconut) significantly improves reasoning performance through working in latent space rather than language space. This paper from Meta's AI research group lays out the logic and results: 💡 Continuous Reasoning Unlocks Efficiency: Large Language Models (LLMs) traditionally reason in "language space," where reasoning steps are expressed as explicit tokens, leading to inefficiencies. The Coconut (Chain of Continuous Thought) paradigm instead reasons in a continuous latent space by feeding the model’s hidden state back as input. This reduces reliance on explicit tokens and improves reasoning efficiency, especially for complex tasks requiring backtracking. 📊 Higher Accuracy in Complex Reasoning Tasks: Coconut achieves significant accuracy improvements on complex tasks requiring planning and logic. In ProsQA, a reasoning-intensive task, Coconut attains 97.0% accuracy, far exceeding Chain-of-Thought (CoT) at 77.5%. Similarly, in logical reasoning tasks like ProntoQA, it achieves near-perfect performance at 99.8% accuracy, outperforming or matching other baselines while demonstrating superior planning capabilities. ⚡ Greater Efficiency with Fewer Tokens: Coconut enhances reasoning efficiency by reducing the number of generated tokens while maintaining accuracy. For example, in GSM8k (math reasoning), Coconut achieves 34.1% accuracy using just 8.2 tokens, compared to CoT's 42.9% accuracy which requires 25 tokens. This token efficiency indicates that reasoning in latent space allows the model to process fewer explicit steps without sacrificing performance. 🌟 Parallel Reasoning Explores Multiple Alternative Steps: Coconut enables LLMs to simultaneously explore multiple reasoning paths by encoding alternative next steps in the continuous latent space. This parallel reasoning behavior mimics breadth-first search (BFS), allowing the model to avoid premature decisions and progressively narrow down the correct solution. 🔄 Multi-Stage Training Accelerates Learning: Coconut leverages a curriculum-based training strategy, where the reasoning chain is gradually replaced with latent thoughts. This phased approach facilitates model learning, improving performance on math problems (GSM8k) and logical tasks, outperforming baselines like No-CoT and iCoT. 🔍 Latent Reasoning Improves Planning and Focus: By reasoning in latent space, the model avoids premature decisions and progressively narrows down possibilities. Coconut shows reduced hallucinations and improved accuracy compared to CoT, demonstrating its ability to prioritize promising reasoning paths while pruning irrelevant ones. New model architectures are consistently improving LLM performance and efficiency. Even without more training data and underlying model progress we are seeing consistent advances. Link to paper in comments.
Program-Aided Reasoning for Large Language Models
Explore top LinkedIn content from expert professionals.
Summary
Program-aided reasoning for large language models refers to techniques that help AI systems solve complex tasks by combining their language skills with structured reasoning steps or tools—like using code, retrieval, or external calculators—to guide their thinking. These approaches enable large language models (LLMs) to tackle intricate problems more accurately by breaking down questions, referencing relevant information, and applying logical processes beyond simple text prediction.
- Integrate structured steps: Encourage models to use both abstract reasoning and external tools, such as calculators or retrieval pipelines, to fill in gaps and improve decision-making.
- Pair knowledge with examples: Provide not only background facts but also step-by-step application examples, so the model can see how to apply information for clearer and more reliable solutions.
- Embrace feedback loops: Use training strategies where the model reflects, retries, and learns from its successes and failures, helping it adapt and solve novel problems over time.
-
-
Excited to share groundbreaking research on DeepRAG - a novel framework that revolutionizes how Large Language Models (LLMs) interact with external knowledge. >> Key Innovation DeepRAG models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic and adaptive retrieval by decomposing complex queries into atomic subqueries. The framework introduces two game-changing components: Retrieval Narrative: Ensures structured retrieval flow by generating subqueries informed by previously retrieved information. Atomic Decisions: Dynamically determines whether to retrieve external knowledge or rely on parametric knowledge for each subquery. >> Technical Implementation The system employs a sophisticated binary tree search method to explore atomic decisions' impact on reasoning outcomes. It synthesizes training data through imitation learning, capturing the "subquery generation - atomic decision - intermediate answer" pattern. >> Performance Highlights - 21.99% improvement in answer accuracy while optimizing retrieval efficiency - Superior performance across multiple QA datasets including HotpotQA, 2WikiMultihopQA, PopQA, and WebQuestions - Demonstrates remarkable capability in time-sensitive QA tasks This breakthrough comes from researchers at Chinese Academy of Sciences and Tencent, marking a significant advancement in making LLMs more efficient and accurate in knowledge retrieval.
-
Enhancing RAG with Application-Aware Reasoning Neat trick to improve RAG systems: give it the relevant knowledge and show it how to apply it. Very simple and effective! In my experience, this approach also works well with AI agents. Here are my notes: > Quick Overview It introduces RAG+, a modular framework that improves traditional RAG systems by explicitly incorporating application-level reasoning into the retrieval and generation pipeline. It bridges retrieval and generation with an application-aware stage. While standard RAG pipelines fetch relevant knowledge, they often fail to show how to use that knowledge effectively in reasoning-intensive tasks. RAG+ fills this gap by retrieving not only knowledge but also paired application examples, leading to more accurate, interpretable, and goal-oriented outputs. > Dual Corpus Retrieval RAG+ constructs two aligned corpora: one of factual knowledge and another of task-specific applications (e.g., step-by-step reasoning traces or worked examples). During inference, both are jointly retrieved, providing the LLM with explicit procedural guidance rather than relying solely on semantic similarity. > Plug-and-play Design The system is retrieval-agnostic and model-agnostic; no fine-tuning or architectural changes are required. This makes it easy to augment any RAG system with application-awareness. > Significant gains across domains Evaluated on MathQA, MedQA, and legal sentencing prediction, RAG+ outperforms vanilla RAG variants by 2.5–7.5% on average, with peak gains of up to 10% for large models like Qwen2.5-72B in legal reasoning. > Stronger with scale and reranking Larger models benefit more from RAG+ augmentation, especially when combined with reranking via stronger LLMs. For example, reranking with Qwen2.5-72B boosted smaller models' performance by up to 7%. > Application-only helps, but full combo is best Including only application examples (without knowledge) still improves performance, but the full combination (RAG+) consistently yields the best results, demonstrating the synergistic effect of pairing knowledge with its usage. Great RAG approach where you can leverage reasoning models.
-
Breakthrough in AI Reasoning: Chain-of-Abstraction Method Improves Accuracy and Speed 👉 Introduction Exciting new research from EPFL and Meta is pushing the boundaries of how large language models (LLMs) can leverage external tools for more accurate and efficient multi-step reasoning. The paper, titled "Efficient Tool Use with Chain-of-Abstraction Reasoning", introduces a novel approach called Chain-of-Abstraction (CoA). 👉 How Chain-of-Abstraction Works With CoA, LLMs are trained to first generate abstract reasoning chains with placeholders, and then call on domain-specific tools to fill in the specific knowledge. For example, given a math word problem like: "In a 90-minute game, Mark played 20 minutes, then another 35 minutes. How long was he on the sideline?" The LLM would generate an abstract chain like: "Mark played for a total of [20+35=y1] minutes. So, he was on the sideline for [90-y1=y2] minutes." Then a calculator tool would fill in the placeholders y1 and y2 with the actual values. This decoupling of high-level reasoning from granular knowledge retrieval enables the LLM to learn more robust and generalizable reasoning strategies. 👉 Improved Accuracy and Speed The results are impressive - CoA consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets in domains like mathematical reasoning and Wikipedia question answering. We're talking accuracy improvements of ~6% on average. But beyond the accuracy boost, CoA also unlocks significant speed and efficiency gains. By allowing the LLM to perform decoding and tool calls in parallel, CoA avoids the inference delays that plague prior methods which must wait for each tool response. On average, CoA delivers a ~1.4x speed-up. 👉 Potential Real-World Impact The potential real-world applications are immense - any domain that requires accurate, efficient multi-step reasoning could benefit, from finance to healthcare to education and beyond. CoA marks an important step forward in developing LLMs that can reliably interface with external knowledge to solve complex problems. Congratulations to the authors on this groundbreaking work! I highly recommend reading the full paper for a deeper dive. What potential applications of CoA are you most excited about? Let me know in the comments!
-
Reinforcement learning (RL) is becoming a core strategy for improving how language models reason. Historically, RL in LLMs was used at the final stage of training to align model behavior with human preferences. That helped models sound more helpful or polite, but it did not expand their ability to solve complex problems. RL is now being applied earlier and more deeply, not just to tune outputs, but to help models learn how to think, adapt, and generalize across different kinds of reasoning challenges. Here are 3 papers that stood out. 𝗣𝗿𝗼𝗥𝗟 (𝗡𝗩𝗜𝗗𝗜𝗔) applies RL over longer time horizons using strategies like entropy control, KL regularization, and reference policy resets. A 1.5B model trained with this setup outperforms much larger models on tasks like math, code, logic, and scientific reasoning. What is more interesting is that the model begins to solve problems it had not seen before, suggesting that RL, when structured and sustained, can unlock new reasoning capabilities that pretraining alone does not reach. 𝗥𝗲𝗳𝗹𝗲𝗰𝘁, 𝗥𝗲𝘁𝗿𝘆, 𝗥𝗲𝘄𝗮𝗿𝗱 𝗯𝘆 𝗪𝗿𝗶𝘁𝗲𝗿 𝗜𝗻𝗰. introduces a lightweight self-improvement loop. When the model fails a task, it generates a reflection, retries, and is rewarded only if the new attempt succeeds. Over time, the model learns to write better reflections and improves even on first-try accuracy. Because it relies only on a binary success signal and needs no human-labeled data, it provides a scalable way for models to self-correct. 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗣𝗿𝗲-𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗳𝗿𝗼𝗺 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗮𝗻𝗱 𝗣𝗲𝗸𝗶𝗻𝗴 𝗨𝗻𝗶𝘃𝗲𝗿𝘀𝗶𝘁𝘆 proposes a new way to pretrain models. Rather than predicting the next token as passive pattern completion, each token prediction is treated as a decision with a reward tied to correctness. This turns next-token prediction into a large-scale RL process. It results in better generalization, especially on hard tokens, and shows that reasoning ability can be built into the model from the earliest stages of training without relying on curated prompts or handcrafted rewards. Taken together, these papers show that RL is becoming a mechanism for growing the model’s ability to reflect, generalize, and solve problems it was not explicitly trained on. What they also reveal is that the most meaningful improvements come from training at moments of uncertainty. Instead of compressing more knowledge into a frozen model, we are beginning to train systems that can learn how to improve mid-process and build reasoning as a capability. This changes how we think about scaling. The next generation of progress may not come from larger models, but from models that are better at learning through feedback, self-reflection, and structured trial and error.
-
If you’re an AI engineer trying to understand how reasoning actually works inside LLMs, this will help you connect the dots. Most large language models can generate. But reasoning models can decide. Traditional LLMs followed a straight line: Input → Predict → Output. No self-checking, no branching, no exploration. Reasoning models introduced structure, a way for models to explore multiple paths, score their own reasoning, and refine their answers. We started with Chain-of-Thought (CoT) reasoning, then extended to Tree-of-Thought (ToT) for branching, and now to Graph-based reasoning, where models connect, merge, or revisit partial thoughts before concluding. This evolution changes how LLMs solve problems. Instead of guessing the next token, they learn to search the reasoning space- exploring alternatives, evaluating confidence, and adapting dynamically. Different reasoning topologies serve different goals: • Chains for simple sequential reasoning • Trees for exploring multiple hypotheses • Graphs for revising and merging partial solutions Modern architectures (like OpenAI’s o-series reasoning models, Anthropic’s Claude reasoning stack, DeepSeek R series and DeepMind’s AlphaReasoning experiments) use this idea under the hood. They don’t just generate answers, they navigate reasoning trajectories, using adaptive depth-first or breadth-first exploration, depending on task uncertainty. Why this matters? • It reduces hallucinations by verifying intermediate steps • It improves interpretability since we can visualize reasoning paths • It boosts reliability for complex tasks like planning, coding, or tool orchestration The next phase of LLM development won’t be about more parameters, it’ll be about better reasoning architectures: topologies that can branch, score, and self-correct. I’ll be doing a deep dive on reasoning models soon on my Substack- exploring architectures, training approaches, and practical applications for engineers. If you haven’t subscribed yet, make sure you do: https://lnkd.in/dpBNr6Jg ♻️ Share this with your network 🔔 Follow along for more data science & AI insights
-
One of the persistent challenges in using large language models (LLMs) is getting them to follow instructions reliably — especially when the instructions are subtle or domain-specific. DeepMind’s latest research introduces Symbol Tuning, a simple yet powerful fine-tuning method that significantly improves an LLM’s ability to follow symbolic prompts (e.g., bullet points, XML, Markdown, or code-like instructions) in zero-shot and few-shot settings. https://lnkd.in/gzKDdHQ2 Why this matters: 🔹 Improves instruction following in GPT-class models 🔹 Works with tiny amounts of data (just 100K tokens!) 🔹 Boosts performance in math, code, and reasoning-heavy tasks 🔹 Enhances models' ability to generalize across symbolic formats This has massive implications for building enterprise agents, RAG pipelines, and developer copilots that need high-precision, structured interaction with users or data. A great reminder: sometimes, small, well-targeted innovations create massive gains. #LLM #InContextLearning #SymbolTuning #PromptEngineering #DeepMind #GenAI #AIResearch #InstructionFollowing #EnterpriseAI #DeveloperTools
-
New paper: Reasoning with financial regulatory texts via Large Language Models from Bledar F., Meriton Ibraimi, Aynaz Forouzandeh and Dr. Arbër Fazlija in the Journal of Behavioral and Experimental Finance Interpreting complex financial regulatory texts, such as the Basel III Accords, can be challenging even for human experts. In this paper, we explore the potential of Large Language Models (LLMs) to perform such tasks. Specifically, we evaluate reasoning strategies, namely Chain-of-Thought (CoT) and Tree-of-Thought (ToT), in their ability to assign accurate risk weights to test cases based on the Basel III Standardized Approach (SA) for Credit Risk. Moreover, we propose and test a guided learning-based few-shot variant of CoT and ToT using human expert input. By evaluating 6,501 test cases, comprised of diverse exposure scenarios, our results demonstrate that few-shot prompting with CoT as well as ToT significantly enhances the LLMs’ accuracy in inferring risk weights. For one-shot CoT, we observe gains of almost 13 percentage points in accuracy with GPT-4o, whereas Claude 3 Sonnet shows gains of more than 10 percentage points. Albeit smaller in magnitude, one-shot ToT improvements are around 9 percentage points. https://lnkd.in/eSB6t4dh
-
Large Language Models (LLMs) are trained with a fixed set of parameters, creating a vast yet unchanging knowledge base after training. Their memory capacity, defined by architecture and context window size, is also fixed. Unlike Turing machines, which can theoretically have unlimited memory via an expandable tape, LLMs lack this flexibility. This limits them from computations needing unbounded memory, distinguishing them from Turing-complete systems. Instead, LLMs function as powerful pattern recognisers, approximating functions based on large but finite datasets. The context window, however, provides a way to temporarily extend an LLM’s memory. This is why Retrieval-Augmented Generation (RAG) has become so popular: it dynamically feeds relevant information into the model, though it remains read-only. As we explore more “agentic” uses of LLMs, though, we start considering the need for read-write memory. In this setup, the LLM functions as the “read/write head” of a Turing machine, reading from memory into its context window and writing key information back. The question is: if the LLM is the "read/write head," what serves as the "tape"? A simple solution is plain text, as used in tools like ChatGPT. This works to a degree, but plain text alone is imprecise, lacks mechanisms for compression, generalisation, and interlinking of information, and may not integrate well with the structured "memory" already present in organisational documents and databases. A fully neural architecture with slowly evolving 'background' neurons and faster-changing, inference-time ones—capable of integrating new information incrementally—could indeed be the end goal. However, a more immediate and pragmatic solution for organisations today is to build a Semantic Layer. Anchored in an "Ontological Core" that defines the organisation’s key concepts, this Semantic Layer interfaces with LLMs, allowing them to read from an ontology that links back to the data in underlying systems. With human oversight, LLMs can also write missing classes into the ontology and add missing facts back into these systems, creating a dynamic Virtual Knowledge Graph for the organisation. In this setup, the Virtual Knowledge Graph effectively becomes the Turing Machine’s “tape”—a dynamic, interconnected memory that the LLM can read from and write back to. By linking facts with URLs and using ontology for generalisation and compression, this graph provides an ideal read/write memory structure for the LLM. Combined in a neural-symbolic loop, this LLM+Graph system could even be Turing complete. This may sound like a far-future concept, but the future is arriving fast. I’m not saying it will be easy, but any organisation transitioning to Data Products can begin this journey today by simply adopting DPROD as their Data Product specification. Often, the first step in a journey turns out to be the most important one! ⭕ Ontologies + LLMs : https://lnkd.in/eACrNUbf
-
Czy na pewno potrzebujemy elektrowni jądrowych i superkomputerów by tworzyć dobre systemy AI? Inspiracje neuro prowadzą do nowych rozwiązań. Grupa z Singapuru właśnie pokazała, jak malutki model (27 mln parametrów) trenowany na niewielkich danych (1000 przykładów) radzi sobie lepiej z trudnymi problemami niż największe i najdroższe modele. To się da zrobić nawet na notebooku. Wang, G., Li, J., Sun, Y., Chen, X., Liu, C., Wu, Y., Lu, M., Song, S., Yadkori, Y. A. (2025). Hierarchical Reasoning Model. https://lnkd.in/d9c5j74U Abstrakt: Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM’s potential as a transformative advancement toward universal computation and general-purpose reasoning systems. Summary: Hierarchical Reasoning Model (HRM): A new way for AI to think https://lnkd.in/dMsQ-6Wu https://lnkd.in/dy3irpum