MLM: Learning Multi-task Loco-Manipulation Whole-Body Control for Quadruped Robot with Arm Arxiv: https://lnkd.in/e5cAEfbG Video: https://lnkd.in/epEcFPhM Can quadrupeds seamlessly blend locomotion and manipulation—walking, balancing, and handling objects—all in one unified framework? MLM introduces a whole-body, multi-task control approach that trains quadruped robots with arms to perform diverse loco-manipulation tasks in a single end-to-end policy, outperforming modular baselines and enabling fluid, human-like coordination. 🔁 At a Glance 💡 Goal: Learn generalizable loco-manipulation policies for quadruped robots with arms, unifying locomotion and manipulation within one control policy. ⚙️ Approach: Whole-body RL policy: Jointly controls legs + arm in a unified action space. Multi-task training: Shared representation across locomotion (walking, turning, balancing) and manipulation (pushing, picking, placing). Domain randomization & sim-to-real transfer for robustness. 📈 Impact (Key Metrics) 🧪 Simulation (Isaac Gym) Outperforms hierarchical and modular baselines by +20–30% success rate in complex loco-manip tasks. Learns smooth transitions between tasks (e.g., walking → pick-up → carry). 🤖 Real-World (Unitree Go1 + 4-DoF Arm) Successfully performs pick-and-carry and push-and-move tasks with high stability. Demonstrates zero-shot adaptation to new terrains and object placements. 🔬 Experiments 🧪 Datasets / Envs: Multi-task simulated environments (navigation, grasping, pushing) 🎯 Tasks: Walk + manipulate, push while balancing, pick-and-carry, obstacle negotiation 🦾 Robot: Unitree Go1 quadruped with 4-DoF arm 📐 Input: Proprioception + onboard RGB-D for task perception 🛠 How to Implement 1️⃣ Unified Policy Learning Train end-to-end RL policy with legs and arm sharing observation-action space. 2️⃣ Multi-task Curriculum Gradually increase task complexity (navigation → manipulation → combined loco-mani). 3️⃣ Sim-to-Real Transfer Apply domain randomization on physics, textures, and sensor noise for robustness. 📦 Deployment Benefits ✅ Unified whole-body control—no hand-crafted locomotion/manipulation modules ✅ Scales across many loco-mani tasks with one policy ✅ Robust to terrain, obstacles, and object variability ✅ Brings quadrupeds closer to true generalist field robots Takeaway MLM marks a step towards general-purpose quadrupeds capable of both moving through the world and shaping it. By unifying loco-manipulation, it pushes quadrupeds beyond mobility into multi-task embodied intelligence. Follow me to know more about AI, ML and Robotics!
More Relevant Posts
-
Multi-Robot System – Step 3: Autonomous Tomato Grasping with Reinforcement Learning on ArmPi Pro Moving from the URDF model to real-world action, this step focused on a core capability: training our ArmPi Pro robot to autonomously grasp a tomato using Reinforcement Learning (RL). Bridging the gap between perception and control presented a fascinating challenge. Here's a breakdown of the implemented pipeline: 🔹 The Perception & Data Flow: Camera Driver Node: Interfaces directly with the RGB-D camera hardware, publishing three critical data streams: RGB Image Depth Image (A matrix where each pixel value represents distance, not color). CameraInfo (Contains vital parameters like the camera's intrinsic matrix and distortion coefficients). Point Cloud Processing: The raw depth image alone is insufficient for 3D navigation. A dedicated node converts it into a 3D Point Cloud, transforming each pixel into an (x, y, z) coordinate in the camera's frame. This is then transformed into the robot's base frame for a unified spatial understanding. Tomato Detection & Pose Estimation: A vision node, running a pre-trained tomato detection model, subscribes to the RGB stream. It identifies the tomato, calculates its center of mass, and provides a bounding box. This 2D data is fused with the 3D point cloud to accurately estimate the tomato's 6D pose (3D position + 3D orientation) in the robot's workspace. 🔹 The Reinforcement Learning Core: State Vector: A combination of the estimated tomato pose, the robot's joint states (6 arm + 4 wheel angles), and timestamp. Reward Function: Carefully engineered to guide the learning process: Distance Derivative: Rewards movement towards the tomato, not just proximity (using the derivative of distance to create positive/negative feedback). Time Penalty: Incentivizes efficiency by penalizing prolonged execution times. Grip Success: A sparse reward for successful gripping (manually triggered during training). 🚀 Overcoming the Training Hurdle: Training an RL agent from scratch on physical hardware is notoriously time-consuming and can take weeks. To achieve convergence within a feasible timeframe, we bypassed the extensive exploration phase. Our Solution: Offline Pre-Training with Expert Demonstrations. I teleoperated the robot through smooth, successful grasping trajectories and recorded all the corresponding state-action pairs. This dataset was used to pre-train the RL agent, effectively "seeding" it with successful behavior. The result? Drastically faster convergence and stable policy learning. ✅ Conclusion: This work successfully integrates computer vision, robot kinematics, and Reinforcement Learning to create an autonomous grasping system. It marks a significant milestone in our journey towards a collaborative multi-robot system for agricultural tasks. #Robotics #ROS #ReinforcementLearning #ImitationLearning #ComputerVision #AI #EmbeddedSystems #ControlTheory #GreenTech #IoT #UM6P #ENSA
To view or add a comment, sign in
-
Why Today’s Humanoids Won’t Learn Dexterity 🤖 What’s really hard? Vision + motion tracking + huge data sets have given us huge leaps in domains like image recognition and language. But applying the same “end-to-end learning from video” trick to robots manipulating the real world? That’s a very different ball game. The human hand isn’t just a camera + joint movement. It’s loaded with tens of thousands of tactile sensors, force-feedback loops, muscle-spindle sensing, adaptive strategies and subtle planning. It’s embodied. When we strip away touch or force sensation, simple tasks become exponentially harder. Brooks shows that. ��️ Why the current roadmap raises questions: Many humanoid efforts assume: show the robot video of someone doing the task → robot learns to mimic. But what about touch, force and feedback loops? Learning state → action ignores the fact that humans often follow plans and modulate actions via sensing rather than purely reacting. Scaling robots to full human size (walking, manipulating in human spaces) introduces physical risks, energy scaling issues, safety hurdles. 🔍 What could be done differently: Focus more on rich sensing (touch/force/haptics) as much as vision. Without that, true dexterity will remain elusive. Shift from “mimic motion” to “learn structure and feedback loops” — i.e., plan + sense + adapt. Be realistic about form: instead of insisting on human-legged humanoids for every task, design robots that match tasks (wheels when appropriate, simplified grippers, optimized sensors) and still get the job done. 🎯 Takeaway for tech leaders & investors: If you’re backing humanoids, ask tough questions: What’s the sensory feedback portfolio (touch/force/haptics)? What’s the learning endpoint? Are you training just imitation of motion, or building structure of touch + adaptation? How are safety and physical scaling being addressed? The vision of a human-sized robot seamlessly stepping into human roles is compelling — but it might take decades, not years. And when it comes, it may look very different from the “two legs, two arms, five-fingered hand” prototype many imagine. Let’s push for pragmatic progress, not hype. And embrace that the path to robot dexterity may require fundamentally new sensors, new actuation and new learning paradigms — not just bigger neural nets. Source: https://lnkd.in/eVsH47cW
To view or add a comment, sign in
-
-
Why can reinforcement learning find solutions that humans can't? Give a human a problem, say, managing warehouse logistics or playing Go, and they’ll rely on intuition and experience. Give it to a reinforcement learning (RL) agent, and it will rely on something simpler: trial, error, and feedback. And yet, RL often discovers solutions that no human ever would. Humans are limited by pattern bias: we search within what makes sense to us. RL doesn’t care if a strategy makes sense—it only cares if it works. Examples: Go (AlphaGo) — discovered “alien” strategies unseen in 2,500 years of play. Robotics — agents learn movements that look odd but minimize energy. Data centers (DeepMind) — RL cut cooling costs by 40% with policies engineers couldn’t explain. RL explores vast action spaces and learns purely from feedback, uncovering non-intuitive solutions that no one would design manually. It’s a humbling reminder that the best solutions might come from systems that learn and evolve through experience. #reinforcement_learning
To view or add a comment, sign in
-
Gonna start a series where I extract ideas from papers and share them here. Episode 1: HUMAN-PRIOR BOOTSTRAPPING, When you want a system to eventually outperform people, start by letting it imitate human demonstrations just long enough to become competent, then switch to self-improvement. Like a student driver who shadows an instructor for a few hours, then practices alone to develop faster reflexes and smoother routes. This principle applies to customer-service chatbots that begin with canned expert responses and later refine their own tone, or to CNC machines that first trace a machinist’s toolpaths and then optimize feed rates for speed and tool life. CONSERVATIVE OFFLINE GATING, Before letting a learning system loose in the real world, use a cheap simulation or replay buffer to test each candidate update and only accept changes that pass a safety margin. Picture a race-car team that tweaks the engine map on a dyno and only bolts the new tune into the car if lap-time simulations improve by at least 5 %. The same gatekeeper idea keeps recommender systems from suddenly promoting fringe content, or prevents a warehouse robot from adopting risky trajectories that only look good in hindsight. ONE-STEP DISTILLATION FOR SPEED, Compress a slow but capable multi-step process into a single fast step by training a lightweight “student” model to mimic the outputs of the slower “teacher.” Think of an experienced chef who, after years of tasting and adjusting, can now glance at a dish and know exactly how much salt it needs without iterative sampling. This trick turns a 10-step diffusion image generator into a one-shot GAN for mobile apps, or collapses a 20-iteration planning routine into a single neural-network call for drone obstacle avoidance. VARIANCE-CLIPPED EXPLORATION, Keep learning stable by bounding how wildly the system is allowed to experiment; tighten the leash early to avoid disasters, loosen it later to escape local ruts. Like teaching a teenager to drive: start with empty parking lots and strict speed limits, then gradually allow highway merges as skill grows. The same dial works in stock-trading bots that cap position sizes during back-tests, or in HVAC controllers that limit temperature swings while learning occupant preferences. ITERATIVE DATA EXPANSION LOOP, Alternate between improving the policy and collecting new data with the improved policy, creating a virtuous circle where better performance begets better training examples. Similar to how a photography club holds monthly contests: each round’s winning shots become next month’s study material, steadily raising everyone’s skill level. This loop accelerates language-model fine-tuning by using the latest model to generate higher-quality synthetic dialogues, or boosts crop-yield prediction by re-sampling fields with the newest sensor-equipped drones.
To view or add a comment, sign in
-
Revolutionize Classroom Feedback with AI Technology — Discover how AI technology can transform classroom feedback, enhancing learning experiences for both teachers and students. https://lnkd.in/dzxBwp9J
To view or add a comment, sign in
-
-
Did you know that every supervised learning problem can, in theory, be reframed as a reinforcement learning problem, but not the other way around? In supervised learning, the model learns from labeled data it’s told exactly what the right answer is. Example: Given an image, predict whether it’s a cat or a dog. In reinforcement learning (RL), the model called an agent learns by interacting with an environment 🌍. - Each action it takes changes the environment’s state, and the agent receives a reward signal (rᵢ) based on how good that action was. - Over time, it learns a policy that maximizes the total reward. So theoretically, if we replaced the objective function in a supervised model with a reward system that gives +1 for a correct label and 0 for a wrong one, we’d have turned a supervised task into a reinforcement learning one. But here’s the catch 👉 Doing this is rarely necessary and often overengineering. Why? Because supervised problems already provide direct feedback, you don’t need to simulate an environment or define a reward policy when you already have explicit labels. Reinforcement learning shines when: The feedback is delayed or uncertain (like in games or robotics). The model’s actions change future outcomes. Otherwise, adding an environment and reward mechanics to a simple classification task is like building a spaceship to cross the street 🚶♂️🚀 Have you ever tried designing your own reward function for a learning system? What was the hardest part?
To view or add a comment, sign in
-
-
Coherence You are always going through the process of context learning in your life. Any form of knowledge involves context learning. It is not only limited to robotics or engineering, but it is always with us. Coherence means maintaining logical continuity with what has already been learned. Any new piece of information is not a separate or isolated fact, but rather it fits smoothly into the chain of knowledge or context already built. Without Coherence: Knowledge or information feels disconnected and is harder to recall. If you think closely, a distinct piece of information on its own is of little use and even harder to remember. With Coherence: Knowledge is connected with context and is easier to transfer. It becomes consistent, layered, and logical. #ContextLearning #Coherence #KnowledgeBuilding #LearningProcess #Education #GrowthMindset #ContinuousLearning #LogicalThinking #KnowledgeIsPower #ShapedAIAgents #AI #FutureofTechnology
To view or add a comment, sign in
-
-
Reward Functions: Teaching Robots What “Success” Looks Like Every robot learning through reinforcement learning has one question it must answer, what does success mean? That answer lives inside its reward function, a mathematical rule that scores every action the robot takes. > Positive reward? Do more of that. > Negative reward? Avoid it next time. This feedback loop becomes the foundation of how robots learn to act, whether balancing a cube, sorting objects, or driving autonomously. But designing reward functions is far from easy. Why? >> Dense vs. Sparse feedback: Too much guidance (dense rewards) can cause overfitting; too little (sparse rewards) slows learning. >> Unintended behaviors: Robots often “game the system.” A robot meant to slide a block might slide the whole table instead. >> Real-world complexity: Many goals (like “tighten a screw” or “pour smoothly”) can’t be captured by visual data alone. To address this, researchers are exploring: >> Human-in-the-loop systems, where humans provide evaluative feedback. >> LLM-assisted reward design, using language models to interpret abstract goals. >> Example-based learning, where robots infer rewards by watching human demonstrations. Reward functions are more than math, they’re the ethics and intent behind robot learning. They define what a robot values, and in turn, how it behaves in the world. At Cybernetic, this is where we see the next evolution, where reward shaping becomes not just about performance, but alignment between robotic optimization and human purpose.
To view or add a comment, sign in
-
-
Imagine a world where every learner receives a personalized education, tailored to their unique style and pace. Welcome to the future of eLearning, driven by AI! At BESTINE, we believe that Artificial Intelligence is not just a tool, but a game-changer for the eLearning landscape. By leveraging AI, we can enhance content delivery, making it more relevant and engaging for every learner. Think of AI as a personal tutor that understands individual learning preferences, pacing lessons according to each student's needs. Automation plays a crucial role too; it frees instructors from repetitive tasks, allowing them to focus on what truly matters—creating impactful learning experiences. From automatically grading quizzes to personalizing learning paths, the possibilities are endless. Are you ready to embrace the AI revolution in eLearning and redefine your approach to education? Let's innovate together! #eLearning #AI #Automation #EdTech #PersonalizedLearning
To view or add a comment, sign in
-