Robot Learning Techniques for Diverse Systems

Explore top LinkedIn content from expert professionals.

Summary

Robot learning techniques for diverse systems are methods that enable robots to learn new tasks, adapt to unfamiliar environments, and generalize their skills by using a mix of sensory data, human instruction, and prior experience. These approaches are designed to help robots flexibly operate across different physical settings and task requirements without needing explicit programming for each scenario.

Combine varied data: Mix vision, language, and robot demonstration datasets to help robots build an internal understanding of objects, environments, and actions.
Train for adaptability: Use training methods that allow robots to adjust to new situations, objects, and tasks by encouraging generalization rather than rote repetition.
Build skill libraries: Let robots incrementally collect and reuse previous skills so they can tackle complex or unfamiliar tasks by composing knowledge in flexible ways.

Summarized by AI based on LinkedIn member posts

Clem Delangue 🤗 Clem Delangue 🤗 is an Influencer

Co-founder & CEO at Hugging Face

299,076 followers 6mo
Report this post
🦾 Great milestone for open-source robotics: pi0 & pi0.5 by Physical Intelligence are now on Hugging Face, fully ported to PyTorch in LeRobot and validated side-by-side with OpenPI for everyone to experiment with, fine-tune & deploy in their robots! π₀.₅ is a Vision-Language-Action model which represents a significant evolution from π₀ to address a big challenge in robotics: open-world generalization. While robots can perform impressive tasks in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training. Generalization must occur at multiple levels: - Physical Level: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments - Semantic Level: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills - Environmental Level: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals The breakthrough innovation in π₀.₅ is co-training on heterogeneous data sources. The model learns from: - Multimodal Web Data: Image captioning, visual question answering, object detection - Verbal Instructions: Humans coaching robots through complex tasks step-by-step - Subtask Commands: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed) - Cross-Embodiment Robot Data: Data from various robot platforms with different capabilities - Multi-Environment Data: Static robots deployed across many different homes - Mobile Manipulation Data: ~400 hours of mobile robot demonstrations This diverse training mixture creates a "curriculum" that enables generalization across physical, visual, and semantic levels simultaneously. Huge thanks to the Physical Intelligence team & contributors Model: https://lnkd.in/eAEr7Yk6 LeRobot: https://lnkd.in/ehzQ3Mqy
No more previous content

No more next content
10 Comments
Like Comment
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, author of 📖 The Hundred-Page Language Models Book and 📖 The Hundred-Page Machine Learning Book

485,995 followers 2mo
Report this post
VLA models are systems that combine three capabilities into one framework: seeing the world through cameras, understanding natural language instructions like "pick up the red apple," and generating the actual motor commands to make a robot do it. Before these unified models existed, robots had separate modules for vision, language, and movement that were stitched together with manual engineering, which made them brittle and unable to handle new situations. This review paper covers over 80 VLA models published in the past three years, organizing them into a taxonomy based on their architectures—some use a single end-to-end network, others separate high-level planning from low-level control, some use diffusion models for smoother action sequences. The paper walks through how these models are trained using both internet data and robot demonstration datasets, then maps out where they're being applied. The later sections lay out the concrete technical problems that remain unsolved. Read online with an AI tutor: https://lnkd.in/eZdzYfdu PDF: https://lnkd.in/ezzncewE
No more previous content

No more next content
6 Comments
Like Comment
Murtaza Dalal

Robotics ML Engineer @ Tesla Optimus | CMU Robotics PhD

2,114 followers 1y
Report this post
Can my robot cook my food, tidy my messy table, rearrange my dresser and do much much more without ANY demos or real-world training data? Introducing ManipGen: A generalist agent for manipulation that can solve long-horizon robotics tasks entirely zero shot, from text input! Key idea: for many manipulation tasks of interest, they can be decomposed into two phases, contact-free reaching (aka motion planning!) and contact-rich local interaction. The latter is hard to learn, and we take a sim2real transfer approach! We define local policies, which operate in a local region around an object of interest. They are uniquely well-suited to generalization (see below!) and sim2real transfer. This is because they are invariant to: 1) Absolute pose 2) Skill orders 3) Environment configurations As an overview, our approach 1) acquires generalist behaviors for local skills at scale using RL 2) distills these behaviors into visuomotor policies using multitask DAgger and 3) deploys local policies in the real-world using VLMs and motion planning. Phase 1: Train state-based, single-object policies to acquire skills such as picking, placing, opening and closing. We train policies using PPO across thousands of objects, designing reward and observation spaces for efficient learning and effective sim2real transfer. Phase 2: We need visuomotor policies to deploy on robots! We distill single-object experts into multi-task policies using online imitation learning (aka DAgger) that observe local visual (wrist cam) input with edge and hole augmentation to match real-world depth noise. To deploy local policies in the real-world, we decompose the task into components (GPT-4o), estimate where to go using Grounded SAM, and motion plan using Neural MP. For control, we use Industreallib from NVIDIA, an excellent library for sim2real transfer! ManipGen can solve long-horizon tasks in the real-world entirely zero-shot generalizing across objects, poses, environments and scene configurations! We outperform SOTA approaches such as SayCan, OpenVLA, LLMTrajGen and VoxPoser across 50 tasks by 36%, 76%, 62% and 60%! ManipGen exhibits exciting capabilities such as performing manipulation in tight spaces and with clutter, entirely zero-shot! From putting items on the shelf, carefully extracting the red pepper from clutter and putting large items in drawers, ManipGen is quite capable. By training local policies at scale on thousands of objects, ManipGen generalizes to some pretty challenging out of distribution objects that don’t look anything like what was in training, such as pliers and the clamps as well as deformable objects such as the wire. This work was done at Carnegie Mellon University Robotics Institute, with co-lead Min Liu, as well as Deepak Pathak and Russ Salakhutdinov and in collaboration with Walter Talbott, Chen Chen, Ph.D., and Jian Zhang from Apple. Paper, videos and code (coming soon!) at https://lnkd.in/ekjWPXHM

16 Comments
Like Comment
Michael Yip

Professor at UC San Diego | US National Academy of Inventors Member | Deep Tech Co-Founder | Surgical and Healthcare Robotics, Humanoid Robots, and Physical AI Strategic Advisor

6,946 followers 4mo
Report this post
Can a robot learn incrementally over time, building a library of skills that allows it to reuse, compose and build upon prior knowledge to become more advanced? That has been an area of research my lab (and many others!) have been fighting for many years. But it is not trivial to figure out how to combine pieces of knowledge, skillsets into a library of knowledge especially when that knowledge base may have very different objectives, state-action spaces, or model sizes. Yun-Jie Ho and Zih-Yun Chiu set out to tackle this problem, and happy to say that recently their results have been published in IEEE RA-L! SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning. https://lnkd.in/ePXKY2yR Instead of having to create a huge policy that can handle all possible skills, we, SurgIRL, let a robot build *incrementally* a growing library of prior skills and learns to reuse them and compose them when facing new tasks. We demonstrated that this incremental learning approach can handle multiple simulated surgical tasks and then transfer policies into reality on a da Vinci Research Kit (dVRK). Some of these tasks area really hard -- like precision grasping of suture needles --- and some are really diverse, like camera tracking -- all composed under the same architecture. The benefit is that this framework lets us be flexible with retaining knowledge of skills in their original form, allowing flexibility and updates naturally. Huge kudos to the team—Yun-Jie Ho, Zih-Yun (Sarah) Chiu, Yuheng Zhi—for driving this forward. I'm hoping for continued research into life-long learning in robot manipulation -- not entirely there yet but great step forward in my opinion!

3 Comments
Like Comment
Akshet Patel 🤖

Robotics Engineer | Creator

51,192 followers 10mo
Report this post
1. Scan 2. Demo 3. Track 4. Render 5. Train models 6. Deploy What if robots could learn new tasks from just a smartphone scan and a single human demonstration, without needing physical robots or complex simulations? [⚡Join 2400+ Robotics enthusiasts - https://lnkd.in/dYxB9iCh] A paper by Justin Yu, Letian (Max) Fu, Huang Huang, Karim El-Refai, Rares Andrei Ambrus, Richard Cheng, Muhammad Zubair Irshad, and Ken Goldberg from the University of California, Berkeley and Toyota Research Institute Introduces a scalable approach for generating robot training data without dynamics simulation or robot hardware. "Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware" • Utilises a smartphone-captured object scan and a single human demonstration video as inputs • Reconstructs detailed 3D object geometry and tracks 6-DoF object motion using 3D Gaussian Splatting • Synthesises thousands of high-fidelity, robot-agnostic demonstrations through photorealistic rendering and inverse kinematics • Generates data compatible with vision-language-action models and imitation learning policies • Demonstrates that models trained on this data can match the performance of those trained on 150 human teleoperation demonstrations • Achieves a 27× increase in data generation throughput compared to traditional methods This approach enables scalable robot learning by decoupling data generation from physical robot constraints. It opens avenues for democratising robot training data collection, allowing broader participation using accessible tools. If robots can be trained effectively without physical hardware or simulations, how will this transform the future of robotics? Paper: https://lnkd.in/emjzKAyW Project Page: https://lnkd.in/evV6UkxF #RobotLearning #DataGeneration #ImitationLearning #RoboticsResearch #ICRA2025

13 Comments
Like Comment
Kal Mos

Executive VP, Research & Predevelopment @ Siemens, ex-Google, ex-Amazon AGI, Startup Founder

12,971 followers 4mo
Report this post
The new Nested Learning (NL) framework from Google's researchers is an impressive new architecture introducing multi-level memory, time-scale–separated optimization, and hierarchical control loops for next-generation LLMs and physical AI systems. NL’s nested optimizer structure provides a clean mechanism for rapid local adaptation while preserving slow structural learning. This architecture could enhance the capabilities of polyfunctional robots to manage multi-modal actions, dynamic workloads, stochastic environments, and safety-critical autonomy across various industries, including manufacturing, logistics, energy systems, and advanced assembly lines. https://lnkd.in/gEQjthAn #NestedLearning #PolyfunctionalRobots #AutonomousRobots #Robotics #PhysicalAI #GenAI #LLM #IndustrialAI #IndustrialAutomation #AdaptiveControl #MultiLevelOptimization #HierarchicalLearning #ContinualLearning #TaskSwitching #SkillTransfer #SelfOptimizingMachines #RobotIntrospection #RealTimeAI #ControlSystems #EdgeAI #CyberPhysicalSystems #Mechatronics #AIEngineering #DigitalTwin #PredictiveControl #MPC #ReinforcementLearning #Optimization #IndustrialRobotics #AutomationEngineering #Siemens #RPD #FutureOfIndustry #AdvancedManufacturing #CognitiveRobotics #MultiModalAI #DynamicPlanning #SafeAutonomy #AIArchitecture #Siemens

Introducing Nested Learning: A new ML paradigm for continual learning research.google
Like Comment
Chris Paxton

AI + Robotics Research Scientist

8,621 followers 4mo
Report this post
Just collecting manipulation data isn’t enough for robots - they need to be able to move around in the world, which has a whole different set of challenges from pure manipulation. And bringing navigation and manipulation together in a single framework is even more challenging. Enter HERMES, from Zhecheng Yuan and Tianming Wei. This is a four-stage process in which human videos are used to set up an RL sim-to-real training pipeline in order to overcome differences between robot and human kinematics, and used together with a navigation foundation model to move around in a variety of environments. To learn more, join us as Zhecheng Yuan and Tianming Wei tell us about how they built their system to perform mobile dexterous manipulation from human videos in a variety of environments. Watch Episode #45 of RoboPapers today, hosted by Michael Cho and Chris Paxton! Abstract: Leveraging human motion data to impart robots with versatile manipulation skills has emerged as a promising paradigm in robotic manipulation. Nevertheless, translating multi-source human hand motions into feasible robot behaviors remains challenging, particularly for robots equipped with multi-fingered dexterous hands characterized by complex, high-dimensional action spaces. Moreover, existing approaches often struggle to produce policies capable of adapting to diverse environmental conditions. In this paper, we introduce HERMES, a human-to-robot learning framework for mobile bimanual dexterous manipulation. First, HERMES formulates a unified reinforcement learning approach capable of seamlessly transforming heterogeneous human hand motions from multiple sources into physically plausible robotic behaviors. Subsequently, to mitigate the sim2real gap, we devise an end-to-end, depth image-based sim2real transfer method for improved generalization to real-world scenarios. Furthermore, to enable autonomous operation in varied and unstructured environments, we augment the navigation foundation model with a closed-loop Perspective-n-Point (PnP) localization mechanism, ensuring precise alignment of visual goals and effectively bridging autonomous navigation and dexterous manipulation. Extensive experimental results demonstrate that HERMES consistently exhibits generalizable behaviors across diverse, in-the-wild scenarios, successfully performing numerous complex mobile bimanual dexterous manipulation tasks Project Page: https://lnkd.in/e-aEbQzn ArXiV: https://lnkd.in/eemU6Pwa Watch/listen: Youtube: https://lnkd.in/erzbkYjz Substack: https://lnkd.in/e3ea76Q8

Ep#45: HERMES: Human-to-Robot Embodied Learning From Multi-Source Motion Data for Mobile Dexterous Manipulation

robopapers.substack.com
Like Comment

Robot Learning Techniques for Diverse Systems

Summary

Ep#45: HERMES: Human-to-Robot Embodied Learning From Multi-Source Motion Data for Mobile Dexterous Manipulation

robopapers.substack.com

More in Robotics Engineering Technical Skills

Explore categories