Researchers from Oxford University just achieved a 14% performance boost in mathematical reasoning by making LLMs work together like specialists in a company. In their new MALT (Multi-Agent LLM Training) paper, they introduced a novel approach where three specialized LLMs - a generator, verifier, and refinement model - collaborate to solve complex problems, similar to how a programmer, tester, and supervisor work together. The breakthrough lies in their training method: (1) Tree-based exploration - generating thousands of reasoning trajectories by having models interact (2) Credit attribution - identifying which model is responsible for successes or failures (3) Specialized training - using both correct and incorrect examples to train each model for its specific role Using this approach on 8B parameter models, MALT achieved relative improvements of 14% on the MATH dataset, 9% on CommonsenseQA, and 7% on GSM8K. This represents a significant step toward more efficient and capable AI systems, showing that well-coordinated smaller models can match the performance of much larger ones. Paper https://lnkd.in/g6ag9rP4 — Join thousands of world-class researchers and engineers from Google, Stanford, OpenAI, and Meta staying ahead on AI http://aitidbits.ai
Improving LLM Performance for Algorithm Discovery
Explore top LinkedIn content from expert professionals.
Summary
Improving LLM performance for algorithm discovery means finding better ways for large language models (LLMs) to solve complex problems, like math or information retrieval, by making them reason more thoughtfully and work together more efficiently. These advances help LLMs think step-by-step, check their own work, and even create strategies before tackling a question, making them more reliable and creative problem-solvers.
- Encourage deliberate reasoning: Set up your LLMs to think through problems in multiple steps, allowing them to check and refine their answers much like a person would break down a tough question.
- Use teamwork strategies: Let different LLMs take on specialized roles—like generating solutions, double-checking answers, or offering hints—to mimic how experts collaborate and catch more mistakes.
- Incorporate self-verification: Have models revisit both their answers and original questions to confirm accuracy, helping to reduce errors and build confidence in their results.
-
-
Exciting Research Alert: Enhancing Dense Retrieval with Deliberate Thinking I just came across a fascinating new paper titled "Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search" that introduces DEBATER (Deliberate Thinking based Dense Retriever), a novel approach to improve information retrieval using large language models. The research team from Northeastern University and Tsinghua University has developed a method that significantly outperforms existing dense retrieval systems by enabling LLMs to "think deliberately" before generating document representations. >> Technical Details DEBATER enhances LLM-based retrievers through two key mechanisms: 1. Chain-of-Deliberation (CoD): This approach delays the computation of document embeddings by performing several steps of reasoning. It incorporates a sequence of prompt tokens that stimulate the reasoning capability of LLMs, encouraging the model to think step-by-step before producing the final document embedding. 2. Self Distillation (SD): This mechanism distills knowledge from different thinking steps into the final document representation. It identifies the most informative thinking steps and integrates them into a unified text embedding. The implementation uses cosine similarity to measure the similarity between queries and documents. During training, DEBATER calculates similarity scores between query representation and document representations at each thinking step, then selects the most useful thinking step from CoD. >> Performance What's particularly impressive is that DEBATER-4B outperforms larger 7B-scale LLM-based dense retrievers while using significantly fewer parameters. In experiments on the BEIR benchmark, DEBATER achieved more than a 2% improvement over baseline retrievers. The researchers found that an appropriate thinking depth (around 4-8 steps) effectively activates the reasoning capabilities of LLM-based retrievers. Interestingly, larger models benefit more from extended reasoning due to their stronger ability to integrate and retain detailed intermediate steps. This work represents an important advancement in dense retrieval technology by leveraging the reasoning capabilities of LLMs to generate more effective document representations. If you're interested in information retrieval or LLMs, this paper is definitely worth checking out!
-
BEATS framework : improving LLMs Mathematical capabilities with back-verification and Pruning Tree Search This paper propose a novel approach, BEATS, to enhance mathematical problem-solving abilities, with newly curated prompts for step-by-step reasoning, back-verification technique for enhancing reliability of results, and pruning tree search for controlled inference time while maintaining state-of-the-art performance. 𝗞𝗲𝘆 𝗰𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀: - developed three new curated prompts to solve math problems step by step, thereby avoiding ambiguities in the problem statements. - implement a pruning strategy for the tree by imposing constraints on the search steps, by restricting the rewriting of the question to once and the provision of the answer to once. - propose a novel back-verification method that re-submits both the answer and the problem to the model for a judgment of correctness, enhances performance of searching in LLMs compared to majority voting 𝗠𝗲𝘁𝗵𝗼𝗱 i) Prompts designed three actions for tree search a) One Step Forward - asks model to progress through search tree by evaluating next logical step based on the current context and information b) Giving Final Answer - directs model to provide a conclusive answer after considering all relevant information c) Disambiguation - reformulates initial query to enhance clarity and specificity, thereby facilitating a more effective search process ii) Pruning Tree Search - root node represents input question and leaf nodes correspond to deduced answers - intermediate nodes represent reasoning states that connect root to leaves - edges between these nodes indicating actions taken during reasoning process - Disambiguation actions are restricted to immediate successors of root node to ensure that clarifications or specifications are handled early - One-step actions are limited to five occurrences within path, preventing inference path becoming long or repetitive - If a node’s content ends with the phrase The answer is, the node is marked as a terminal state and added to set of candidate answers iii) Back-verification - After constructing tree, depth-first search (DFS) is applied to identify leaf nodes. From these, we select only those that contain the phrase The answer is as candidate answers for back verification - Back verification involves leveraging both answer and question to allow LLM to confirm correctness of answer 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 - this method method improves Qwen2-7b-Instruct’s score from 36.94 to 61.52 (outperforming GPT-4’s 42.5) on MATH benchmark - compared to zero-shot, even without BackVerify step, it outperforms baselines, achieving 35.17% on MATH and 83.62% on GSM8K using LLaMA-8B as base model 𝗕𝗹𝗼𝗴: https://lnkd.in/etdCng-m 𝗣𝗮𝗽𝗲𝗿: https://lnkd.in/e6KHsqzv 𝗖𝗼𝗱𝗲: https://lnkd.in/e9RG6pdg #generativeai
-
Is this the next level of LLM "thinking"? We've been trying to make AI better at reasoning by forcing it to show more work - longer chains of thought, more attempts, more compute. But what if there was another way? New research from CMU and Stanford introduces something fascinating: teaching LLMs to first generate their own "reasoning abstractions" - essentially, creating their own cheat sheets before solving problems. Think of it like a student who, before tackling a math problem, first writes down the key principles and potential pitfalls they should watch for. The approach is surprisingly simple. They train two models that work together: one generates helpful hints and strategies (the abstraction generator), and another uses these hints to solve problems (the solution generator). The abstraction generator gets rewarded when its hints actually help solve problems, creating a virtuous cycle of improvement. The results? 44% improvement over previous state-of-the-art methods on challenging math competitions. One interesting thing to note: when given more compute budget, it's actually more effective to generate diverse strategies than to just try solving the problem more times. This suggests our models might have been stuck in local reasoning patterns, and abstractions help them explore genuinely different approaches. The same technique improved performance by 30% across legal reasoning, medical diagnosis, and other domains. Are we watching LLMs develop something that looks increasingly like metacognition? The ability to think about how to think. That's a capability that could fundamentally change how we deploy these systems in education, research, and problem-solving. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡