Improving Domestic Robot Multitasking Capabilities

Explore top LinkedIn content from expert professionals.

Aaron Prather

Director, Robotics & Autonomous Systems Program at ASTM International

85,790 followers 1y
Report this post
Researchers at UC San Diego developed 𝐖𝐢𝐥𝐝𝐋𝐌𝐚, a framework to enhance quadruped robots' loco-manipulation skills in real-world tasks like cleaning or retrieving objects. Using VR-collected demonstrations, Vision-Language Models (VLMs), and Large Language Models (LLMs), robots can break complex tasks into steps (e.g., "pick—navigate—place"). Attention mechanisms improve adaptability, allowing robots to handle chores like tidying or food delivery. While promising, the system's next goal is greater robustness in dynamic environments, aiming for affordable, accessible home assistant robots. 📝 Research Paper: https://lnkd.in/e8HtbUF9 📊 Project Page: https://wildlma.github.io/

2 Comments
Like Comment
Ahsen Khaliq

ML @ Hugging Face

36,024 followers 2y
Report this post
MOSAIC A Modular System for Assistive and Interactive Cooking We present MOSAIC, a modular architecture for home robots to perform complex collaborative tasks, such as cooking with everyday users. MOSAIC tightly collaborates with humans, interacts with users using natural language, coordinates multiple robots, and manages an open vocabulary of everyday objects. At its core, MOSAIC employs modularity: it leverages multiple large-scale pre-trained models for general tasks like language and image recognition, while using streamlined modules designed for task-specific control. We extensively evaluate MOSAIC on 60 end-to-end trials where two robots collaborate with a human user to cook a combination of 6 recipes. We also extensively test individual modules with 180 episodes of visuomotor picking, 60 episodes of human motion forecasting, and 46 online user evaluations of the task planner. We show that MOSAIC is able to efficiently collaborate with humans by running the overall system end-to-end with a real human user, completing 68.3% (41/60) collaborative cooking trials of 6 different recipes with a subtask completion rate of 91.6%. Finally, we discuss the limitations of the current system and exciting open challenges in this domain.

2 Comments
Like Comment
Yann Kronberg

CTO | AI Agents & Gen AI | Cybersecurity | Software Development| Your go-to newsletter for AI & tech: 🔔yannkronberg.substack.com🔔

30,022 followers 2mo
Report this post
No more talks about AI taking away creative work. It finally takes away truly tedious tasks. Figure just announced a major milestone in home robotics: its robot, powered by Helix 02, can now autonomously tidy a living room. • Operates on the same general-purpose neural architecture designed to handle different household tasks. • Performs in a harder real-world challenge: living rooms are messy, dynamic, and far less predictable than controlled environments. • Autonomous multi-step task execution allows the robot to carry out complex tidying tasks on its own. • Has better physical navigation, including squeezing through tight gaps between furniture, which is critical for real home use. Practical humanoid intelligence may be arriving faster than many expected. What we see right now is not a robot for one-task demos. It's a robot that can operate reliably in the messy, unstructured reality of everyday life. Are we closer than we think to a robot in every home? I guess so. Would you let one of these clean your house? Let me know in the comments! ♻️ Enjoyed this content? Share it with your network. P.S. check out 🔔yannkronberg.substack.com🔔 - the newsletter for anyone who wants clear, practical insights on AI and modern tech. Made for founders, builders, and leaders. #AINews #Robotics #RoboticsUpdates

14 Comments
Like Comment
Krishna Chytanya (KC) Ayyagari

Applied AI | Solution Architecture @ Google

11,596 followers 8mo
Report this post
Google DeepMind has unveiled two new models, Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, designed to enable robots to solve complex, multi-step tasks by perceiving, planning, thinking, and acting. Gemini Robotics-ER 1.5 serves as the robot's "brain," handling high-level planning and decision-making, while Gemini Robotics 1.5 executes the physical actions based on natural language instructions. A key innovation is the "think before acting" capability of Gemini Robotics 1.5, which breaks down complex tasks into smaller steps and provides transparency into the robot's reasoning. Additionally, the model can transfer learned skills across different robot types without extensive retraining. Google DeepMind emphasizes the responsible development of these technologies, utilizing the ASIMOV benchmark to evaluate and enhance semantic safety. This advancement is a significant step toward creating more intelligent and helpful robots for everyday life. https://lnkd.in/g8EXJ2dt

Gemini Robotics 1.5: Using agentic capabilities

https://www.youtube.com/

1 Comment
Like Comment
Rangel Isaías Alvarado Walles

Robotics & AI Engineer | AI Engineer | Machine Learning | Deep Learning | Computer Vision | Agentic AI | Reinforcement Learning | Self-Driving Cars | IIoT | AIOps | MLOps | LLMOps | DevOps | AIOps | Embodied AI

5,380 followers 2mo
Report this post
AI/robotics researchers have a new reason to celebrate! 🚀 The paper 'Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning' introduces KG-M3PO, a groundbreaking framework that unifies perception, knowledge, and policy for multi-task robotic manipulation in partially observable environments. 🔁 At a Glance 💡 Goal: Enable robots to perform diverse manipulation tasks reliably by integrating a dynamic, structured world model. ⚙️ Approach: - Model-based policy optimization: Controls robot actions via an online 3D scene graph grounding open-vocabulary detections. - Scene graph updates dynamically, capturing spatial, containment, and affordance relations. - GNN encoder trained end-to-end shapes relational features directly by control performance. - Multimodal observations (visual, proprioceptive, language, graph) fused into a shared latent space. - Policy conditioned on lightweight graph queries, improving decision-making. 📈 Impact (Key Results) 🧪 Success in occlusion, layout shifts, and distractor scenarios. - Higher success rates and sample efficiency. - Enhanced generalization to new objects and configurations. 🔄 Robust long-horizon behaviors demonstrate the importance of structured knowledge. 🤖 Applications include multi-task robots that understand and adapt in complex, partial environments. 🔬 Experiments 🧪 Benchmarks: Isaac Sim, with Franka and UR5 robots. 🎯 Tasks: Pick, place, open, retrieve, with fully and partially observable setups. 🦾 Setup: 1024 parallel environments, GPU-accelerated physics. 📐 Inputs: Images, scene graphs, multimodal sensory data. 🛠 How to Implement 1️⃣ Build scene graphs with BBQ from sensory data. 2️⃣ Encode graphs using GNNs trained with RL loss. 3️⃣ Fuse visual, proprioception, and graph features into state. 4️⃣ Train RL policy end-to-end, propagating gradients through GNN. 5️⃣ Deploy in simulation, and future real-world tests. 📦 Deployment Benefits ✅ Enhanced generalization and sample efficiency. ✅ Robust performance under occlusion and partial views. ✅ Scalable multi-task learning. ✅ Better long-term behavior with structured relational understanding. 📣 Takeaway This work proves that integrating structured, continually updated world knowledge into RL policies significantly boosts robotic manipulation capabilities. It paves the way for more adaptable, intelligent robots that operate reliably in unstructured environments. Follow me to know more about AI, ML, and Robotics!
No more previous content

No more next content
Like Comment
Samuel Kariuki Gitonga

AI Journalist & Digital Strategist | Founder | Content Creator | AI, Fintech & African Affairs | Youth Tech Trainer | Nairobi, Kenya

14,472 followers 1mo
Report this post
🤖 THE AUTONOMY UPGRADE: UNITREE G1’S BRAIN TRANSPLANT The massive reality check for 2026 robotics is that software, not hardware, is the true bottleneck. By installing a new neural brain, a startup has evolved the Unitree G1 from a remote-controlled tool into a fully autonomous agent that manages household chores without any human oversight. 🧠🦾 🕒 COGNITIVE UPGRADE: BEYOND TELEOPERATION Technical execution has shifted from rigid scripts to vision-language-action models. This allows the G1 to perceive 3D environments and make real-time decisions. 📡🧱 Instead of manual guidance, the G1 identifies objects like watering cans or curtains. Using reinforcement learning, it calculates the exact force needed for delicate handles versus heavy fabrics. This leap from mimicry to reasoning enables the robot to navigate hurdles like pets or furniture by recalculating its path instantly. ⚙️📐 🌊 DOMESTIC DEPLOYMENT: THREE PILLARS OF UTILITY Environmental mapping is the first pillar, ensuring the robot understands room geometry to avoid collisions. The second is object manipulation, where the brain translates goals into precise motor commands for multi-fingered hands. Finally, autonomous scheduling allows the G1 to prioritize tasks like plant care or cleaning based on time of day without being prompted. 🏠🧹 🔒 SYSTEM RELIABILITY: SAFETY-CRITICAL OPTIMIZATION Independent operation requires a utility-first approach to safety. The startup implemented a "sanity check" layer to act as a digital watchdog. 🛡️🔐 This software ensures the robot avoids excessive pressure on glass or spills near electronics. This safety-critical optimization transforms a research prototype into a viable consumer product, providing consistent compliance and verification without a technician's constant oversight. ✅⚠️ 🇰🇪 SILICON SAVANNAH PERSPECTIVE: LOCAL LOGIC In our Kenyan IT track, we view the G1 as a blueprint for service-sector innovation. The brain logic is exactly what we teach our mentees to build for our own infrastructure. 🇰🇪💻 We aren't just consumers; we are fine-tuning models to handle unique regional hurdles. Mastering autonomous logic is the key to unlocking efficiency. As we integrate these brains, we move closer to a world where our technology works as hard as we do. 🚀🌍 #UnitreeG1 #Robotics2026 #HumanoidRobot #AILogic #TechInnovation #AutonomousSystems #FutureOfWork #SiliconSavannah #KOT

1 Comment
Like Comment
Pascal Biese

AI Lead at PwC </> Daily AI highlights for 80k+ experts 📲🤗

85,465 followers 10mo
Report this post
China's "I, Robot" moment will probably be trained by none other than 𝘺𝘰𝘶. ByteDance just released GR-3, a vision-language-action model that brings us closer to truly versatile household robots. The robotics field has long struggled with a fundamental challenge: how do you build robots that can generalize beyond their training data? Most current systems excel at specific tasks but fail when encountering new objects or instructions. This brittleness has kept robots confined to controlled environments rather than our messy, unpredictable homes. GR-3 approaches this through a clever multi-faceted training approach. First, it co-trains on both robot trajectories and massive web-scale vision-language data, allowing it to understand abstract concepts like "put the largest object in the box" - instructions it never saw during robot training. Second, it leverages human trajectory data collected via VR devices, enabling rapid adaptation to new scenarios with just 10 demonstrations per object. The key insight: by combining these diverse data sources with flow-matching for action prediction, the model learns robust representations that transfer across different contexts. GR-3 outperforms the previous state-of-the-art on tasks ranging from generalizable pick-and-place to long-horizon table cleaning and dexterous cloth manipulation. It even shows "true" semantic understanding - distinguishing between similar objects and following complex spatial instructions that would trip up most current systems. In order to push toward robots that can genuinely assist in daily life, approaches like GR-3 that prioritize generalization and efficient adaptation will be crucial. The question now isn't if we'll have capable home robots, but how quickly we can scale these capabilities to handle the full complexity of human environments. And, of course, if we, 𝘢𝘴 𝘢 𝘴𝘰𝘤𝘪𝘦𝘵𝘺, want that in the first place. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡

8 Comments
Like Comment

Improving Domestic Robot Multitasking Capabilities

Gemini Robotics 1.5: Using agentic capabilities

https://www.youtube.com/

More in Advancing Robotics Technology

Explore categories