New paper: Compose by Focus — we make robot visuomotor skills robustly composable by focusing perception on the task-relevant parts of a scene using 3D scene graphs, then learning a single policy over those structured inputs. Motivation: planners (VLM/TAMP) can break a long task into sub-goals, but visuomotor policies often crumble in cluttered, novel scenes. The issue isn’t the planner — but the policy's brittle visual processing under distribution shift. We argue skills must be focused, attending only to relevant objects and relations. Method in a nutshell: for each sub-goal, we build a dynamic sub-scene graph—Grounded SAM segments relevant objects, a VLM infers relations; a GNN encodes the graph; CLIP encodes the skill text; a diffusion policy is conditioned on both language + graph features; a VLM planner sequences sub-goals. Why it matters: focusing on relevant objects/relations mitigates distribution shift, enabling reliable multi-skill composition in visually complex scenes. Scene graphs also form a natural interface between high-level planning and low-level control, easing long-horizon execution and reducing data demands. Project & paper: https://lnkd.in/eGv9JQ5n
Compositional Learning Techniques for Robotics
Explore top LinkedIn content from expert professionals.
Summary
Compositional learning techniques for robotics involve teaching robots how to build, adapt, and combine individual skills so they can handle a variety of tasks and environments. This approach helps robots learn continuously, retain past knowledge, and generalize their abilities without needing to retrain from scratch.
- Build skill libraries: Encourage robots to accumulate and reuse learned skills, which allows them to tackle complex tasks by combining simpler actions.
- Focus on adaptability: Train robots to recognize relevant parts of their environment so they can adjust their actions and work reliably in new or cluttered settings.
- Promote continual learning: Use frameworks that allow robots to learn new tasks without forgetting previous ones, helping them stay up-to-date and versatile.
-
-
One of the core challenges in robotics is making robots generalize across tasks, environments, sensors, and data sources. Traditionally, robot learning has been siloed: one model per task, per dataset, per modality. This leads to expensive retraining and poor adaptability. A recent paper from MIT CSAIL introduces PoCo (Policy Composition from and for Heterogeneous Robot Learning), a diffusion-based framework that directly tackles this bottleneck. 🔹The idea: Instead of training a single “one-size-fits-all” model, PoCo allows us to train multiple policies (e.g., task-specific, domain-specific, behavior-constrained) separately, then compose them at inference time. The policies are built as diffusion models over action trajectories, which makes their combination mathematically flexible. This could be a foundational step towards true generalist robots, capable of adapting to diverse and unpredictable environments. Full paper: https://lnkd.in/eim6Se4r
-
🤖 How can we teach robots to continually learn new skills, without forgetting the old ones? 👉 CLARE ensures that as your robot gets smarter, it doesn't lose the skills it (and you as teleoperator 😅) already worked hard to master. Fine-tuning pre-trained vision-language-action models (#VLAs) on a new task has become the standard for robotic manipulation. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments without forgetting the knowledge they have already acquired. Existing continual learning methods for robotics require storing previous data (exemplars), struggle with long task sequences, or rely on oracle task identifiers for deployment. We present CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion. CLARE is a parameter-efficient, exemplar-free framework that allows robots to continuously adapt to new tasks and environments: 🚫 No Exemplars Needed: We don't need to store past data, which is often impossible due to privacy and storage constraints. 🧠 Autonomous Routing: Our autoencoder-based mechanism dynamically selects the right adapter for the current task—no task labels required during deployment. 📉 Efficient Dynamic Expansion: The model autonomously decides when to expand its capacity, increasing parameter counts by only ~2% per task. 🏆 SOTA Results: We achieve significantly higher continual learning performance on the LIBERO benchmark compared to baselines, including methods that replay past data. 📄 Paper: https://lnkd.in/dskhxphh 🌐 Project Website: https://lnkd.in/dRDk63dP 💻 Code: https://lnkd.in/d--udZja 🤗 Hugging Face: https://lnkd.in/dswqWWUr This work has been a great collaboration with Yi Zhang, who is currently on the job market :) Angela Schoellig Technical University of Munich Learning Systems and Robotics Lab Munich Institute of Robotics and Machine Intelligence (MIRMI) at the Technical University of Munich Robotics Institute Germany #Robotics #AI #MachineLearning #ContinualLearning
-
Can a robot learn incrementally over time, building a library of skills that allows it to reuse, compose and build upon prior knowledge to become more advanced? That has been an area of research my lab (and many others!) have been fighting for many years. But it is not trivial to figure out how to combine pieces of knowledge, skillsets into a library of knowledge especially when that knowledge base may have very different objectives, state-action spaces, or model sizes. Yun-Jie Ho and Zih-Yun Chiu set out to tackle this problem, and happy to say that recently their results have been published in IEEE RA-L! SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning. https://lnkd.in/ePXKY2yR Instead of having to create a huge policy that can handle all possible skills, we, SurgIRL, let a robot build *incrementally* a growing library of prior skills and learns to reuse them and compose them when facing new tasks. We demonstrated that this incremental learning approach can handle multiple simulated surgical tasks and then transfer policies into reality on a da Vinci Research Kit (dVRK). Some of these tasks area really hard -- like precision grasping of suture needles --- and some are really diverse, like camera tracking -- all composed under the same architecture. The benefit is that this framework lets us be flexible with retaining knowledge of skills in their original form, allowing flexibility and updates naturally. Huge kudos to the team—Yun-Jie Ho, Zih-Yun (Sarah) Chiu, Yuheng Zhi—for driving this forward. I'm hoping for continued research into life-long learning in robot manipulation -- not entirely there yet but great step forward in my opinion!
-
Every field has a few ideas that quietly change everything. For large language models, it was the Transformer. For modern robot learning, I would argue it is the ACT policy (Action Chunking with Transformers). The ACT policy fundamentally changed what was possible in robotics by opening the door to open-source, learnable policies that could actually run on low-cost hardware. That single shift moved robot learning out of a few elite labs and into the hands of much smaller research teams. What makes ACT truly seminal is that innovation happened on both sides of the stack. On the hardware side, costs dropped by nearly two orders of magnitude. That alone reshaped who could experiment, iterate, and deploy real robots. On the software side, there were three key breakthroughs: (1) A conditional Variational Autoencoder to model action distributions (2) A DETR-based transformer decoder for structured action prediction (3) Action Chunking, inspired by neuroscience, allowing robots to reason and act over meaningful temporal segments rather than single timesteps. Robots trained with ACT could fold clothes, insert batteries, close zip-lock bags, and handle a wide range of real-world manipulation tasks that were previously brittle or hand-engineered. Because of how foundational this policy is, I wrote a from-scratch article explaining ACT, without assuming prior expertise in robotics or deep learning. The goal was simple: lower the barrier for anyone who wants to seriously enter modern robot learning today. If you are curious about where robotics is headed and why learning-based policies are the future, this is a great place to start. Read the Substack article here: https://lnkd.in/dKxQV_hM
-
🦾 𝗔 𝗳𝗮𝘀𝘁𝗲𝗿, 𝗯𝗲𝘁𝘁𝗲𝗿 𝘄𝗮𝘆 𝘁𝗼 𝘁𝗿𝗮𝗶𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹-𝗽𝘂𝗿𝗽𝗼𝘀𝗲 𝗿𝗼𝗯𝗼𝘁𝘀 - Inspired by LLMs, researchers develop a training technique, that enables robots to absorb diverse sensor data 𝗶𝗻𝘁𝗼 𝗮 𝘂𝗻𝗶𝗳𝗶𝗲𝗱 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲. Using principles from large language models (LLMs), MIT’s Heterogeneous Pretrained Transformers (HPT) model enables robots to absorb diverse sensor data—like vision and proprioception—into a unified training architecture, boosting performance over traditional methods by another 20% in real-world and simulated environments. This architecture allows robots to build a broad foundation of "understanding" tasks similar to how LLMs absorb linguistic data. AI’s language models, capable of generalizing across various inputs, have demonstrated that diverse data can create scalable, adaptable intelligence. Applying this concept in robotics, MIT’s HPT allows a robot to train faster by 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘇𝗶𝗻𝗴 from different data sources in a way that simulates human intuition. This cross-pollination of technologies vastly reduces the time and data needed for training, marking a pivotal moment in robotics where complex, real-world tasks become feasible 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗵𝗲 𝗰𝗼𝗻𝘀𝘁𝗿𝗮𝗶𝗻𝘁𝘀 𝗼𝗳 𝘁𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝘁𝗮𝘀𝗸-𝘀𝗽𝗲𝗰𝗶𝗳𝗶𝗰 𝗽𝗿𝗼𝗴𝗿𝗮𝗺𝗺𝗶𝗻𝗴. This cross-acceleration is more than theoretical - industries can now realistically deploy robots for a wide range of tasks without customized programming for each. Such adaptability could redefine roles in manufacturing, logistics, and healthcare, supporting leaner and more responsive operations. We’re now on the cusp of achieving general-purpose robotics—a vision where robots perform complex, multi-domain tasks. 👉 What’s extraordinary is how disciplines like robotics, AI, spatial computing, and cloud computing are profoundly accelerating each other - with remarkable achievements daily. 𝐹𝑜𝑙𝑙𝑜𝑤 𝑚𝑒 𝑓𝑜𝑟 𝑚𝑜𝑟𝑒 𝑖𝑛𝑠𝑖𝑔ℎ𝑡𝑠 𝑜𝑛 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 #AI, #Robotics, 𝑎𝑛𝑑 𝑛𝑒𝑥𝑡-𝑔𝑒𝑛 #IndustrialAutomation. #IndustrialAI #MachineLearning #LLMs #IndustrialAutomation #Innovation