Robotics data is expensive and slow to collect. A lot of videos are available online, but not readily usable by robotics because of lack of action labels. AMPLIFY solves this problem by learning Actionless Motion Priors that unlock better sample efficiency, generalization, and scaling for robot learning. Our key insight is to factor the problem into two stages: The "what": Predict the visual dynamics required to accomplish a task The "how": Map predicted motions to low-level actions This decoupling enables remarkable generalizability: our policy can perform tasks where we have NO action data, only videos. We outperform SOTA BC baselines on this by 27x 🤯 AMPLIFY is composed of three stages: 1. Motion Tokenization: We track dense keypoint grids through videos and compress their trajectories into discrete motion tokens. 2. Forward Dynamics: Given an image and task description (e.g., "open the box"), we autoregressively predict a sequence of motion tokens representing how keypoints should move over the next second or so. This model can train on ANY text-labeled video data - robot demonstrations, human videos, YouTube videos. 3. Inverse Dynamics: We decode predicted motion tokens into robot actions. This module learns the robot-specific mapping from desired motions to actions. This part can train on ANY robot interaction data - not just expert demonstrations (think off-task data, play data, or even random actions). So, does it actually work? Few-shot learning: Given just 2 action-annotated demos per task, AMPLIFY nearly doubles SOTA few-shot performance on LIBERO. This is possible because our Actionless Motion Priors provide a strong inductive bias that dramatically reduces the amount of robot data needed to train a policy. Cross-embodiment learning: We train the forward dynamics model on both human and robot videos, but the inverse model sees only robot actions. Result: 1.4× average improvement on real-world tasks. Our system successfully transfers motion information from human demonstrations to robot execution. And now my favorite result: AMPLIFY enables zero-shot task generalization. We train on LIBERO-90 tasks and evaluate on tasks where we’ve seen no actions, only pixels. While our best baseline achieves ~2% success, AMPLIFY reaches a 60% average success rate, outperforming SOTA behavior cloning baselines by 27x. This is a new way to train VLAs for robotics which dont always start with large scale teleoperation. Instead of collecting millions of robot demonstrations, we just need to teach robots how to read the language of motion. Then, every video becomes training data. led by Jeremy Collins & Loránd Cheng in collaboration with Kunal Aneja, Albert Wilcox, Benjamin Joffe at College of Computing at Georgia Tech Check out our paper and project page for more details: 📄 Paper: https://lnkd.in/eZif-mB7 🌐 Website: https://lnkd.in/ezXhzWGQ
Data-Driven Approaches in Robotics Development
Explore top LinkedIn content from expert professionals.
Summary
Data-driven approaches in robotics development use large amounts of information, such as videos, sensor readings, and human instructions, to train robots to understand and perform tasks in real-world environments. This method helps robots learn general skills and adapt to new situations without needing tons of hand-crafted examples or step-by-step instructions.
- Prioritize data diversity: Gather examples from a variety of sources, environments, and tasks to help robots build flexible skills and handle unexpected situations.
- Improve data quality: Check for clear labeling, correct annotations, and consistent measurements in your dataset so robots learn the right behaviors.
- Use interactive tools: Explore and analyze your robotics data visually to spot patterns, gaps, and errors before training your models.
-
-
Robotics data for physical AI is front and center this year. From GTC's heavy focus on data infrastructure to human data ecosystems like EgoVerse, the field is waking up to the bottleneck of scaling robotics data. And there's real divergence in how people think about quantity, quality, modality, and diversity. Aurora Feng, Albert K., and I have been looking at this problem. We put together a robotics data infrastructure market map and an open vendor/collector list for the community. We structured the market map around the robot data workflow end to end: - Simulations & Evaluation: Generating synthetic + real data, benchmarking model performance. - Curation & Labeling: Annotation, QA, slicing the data that actually matters. - Ingestion & Sync: Time-aligning multimodal sensor streams into usable formats. - Storage & Indexing: Making robotics data searchable and retrievable. - Deployment & Ops: Fleet telemetry, incident detection, closing the loop back into the data stack. Most companies span multiple layers. The interesting question is which layers are still wide open. The LLM instinct is more data = better model. In robotics that logic weakens, and the scaling laws don't directly apply (yet). What actually matters when evaluating data: - Outcome quality: Are success/failure trajectories labeled correctly? Failure trajectories can be quite useful if labeled correctly. Some vendors mix them in with successes to inflate volume and price. - Distribution diversity: How varied are the demonstrations? Different initial states, grasp points, camera views. Narrow distributions produce brittle models. - Annotation granularity: Task-level labels aren't enough. You need trajectory-level and step-level annotations. It's not about dataset size. It's about whether the data teaches the right things. Most robot data pipelines are still duct-taped together. Some gaps we see: - Ingestion & sync is still painfully manual. Time-aligning video, depth, force, and actions across sensors is an engineering tax every team pays independently. - Curation & QA tooling was built for AV, not manipulation or dexterous tasks. - Evaluation infra barely exists outside big labs. Most teams can't tell if new data actually improves their models. Ultimately, the most valuable data infra companies will be the platforms that match the right data modalities to the right model architectures and evolve alongside the models themselves. Who are we missing? If you're building data infra or have insights on what data inputs matter most, we want to hear from you. Vendor list + details in the first comment ↓
-
Building AI for the real world is a very different problem than building AI for text. I sat down with Steve Xie, Ph.D., Founder & CEO of Lightwheel on The Ravit Show, to break down what it actually takes to train systems that operate in the physical world. Steve’s journey from Peking University to Columbia University, and then into leadership roles at Cruise, NVIDIA, and NIO, gives him a unique lens into where today’s AI systems struggle when they leave controlled environments and face the real world. One of the biggest takeaways from this conversation is that the core bottleneck in AI is no longer models, it is data. While large language models benefited from massive, passive data sources, robotics has no equivalent. There is no scalable way to collect real-world interaction data, no reliable evaluation layer, and very little infrastructure to continuously improve systems once deployed. This is where simulation becomes critical. In autonomous driving, simulation is helpful. In robotics, it is foundational. You cannot run thousands of parallel experiments in the real world, and you cannot reset physical environments at will. Simulation is what makes learning, testing, and iteration possible at scale. But not everything that looks like simulation actually works. As Steve explains, true simulation needs to be physically accurate, reproducible, and capable of generating actionable feedback. Without that, it cannot train real systems. What makes Lightwheel interesting is their approach to solving this problem. Instead of starting with data collection, they start with evaluation. They identify where models fail, generate targeted data to fix those failures, and create a continuous feedback loop. It is a shift from a passive data pipeline to an active data engine built for physical AI. They are already working with teams like DeepMind, ByteDance, and Alibaba, building infrastructure that sits beneath both robotics companies and AI labs. The bigger idea is simple. You cannot scrape your way to physical intelligence. You have to generate, test, and refine data in closed loops. #data #ai #robot #nvidiagtc #lightwheel #api #training #behaviour #theravitshow
-
🦾 Great milestone for open-source robotics: pi0 & pi0.5 by Physical Intelligence are now on Hugging Face, fully ported to PyTorch in LeRobot and validated side-by-side with OpenPI for everyone to experiment with, fine-tune & deploy in their robots! π₀.₅ is a Vision-Language-Action model which represents a significant evolution from π₀ to address a big challenge in robotics: open-world generalization. While robots can perform impressive tasks in controlled environments, π₀.₅ is designed to generalize to entirely new environments and situations that were never seen during training. Generalization must occur at multiple levels: - Physical Level: Understanding how to pick up a spoon (by the handle) or plate (by the edge), even with unseen objects in cluttered environments - Semantic Level: Understanding task semantics, where to put clothes and shoes (laundry hamper, not on the bed), and what tools are appropriate for cleaning spills - Environmental Level: Adapting to "messy" real-world environments like homes, grocery stores, offices, and hospitals The breakthrough innovation in π₀.₅ is co-training on heterogeneous data sources. The model learns from: - Multimodal Web Data: Image captioning, visual question answering, object detection - Verbal Instructions: Humans coaching robots through complex tasks step-by-step - Subtask Commands: High-level semantic behavior labels (e.g., "pick up the pillow" for an unmade bed) - Cross-Embodiment Robot Data: Data from various robot platforms with different capabilities - Multi-Environment Data: Static robots deployed across many different homes - Mobile Manipulation Data: ~400 hours of mobile robot demonstrations This diverse training mixture creates a "curriculum" that enables generalization across physical, visual, and semantic levels simultaneously. Huge thanks to the Physical Intelligence team & contributors Model: https://lnkd.in/eAEr7Yk6 LeRobot: https://lnkd.in/ehzQ3Mqy
-
I was listening to one of my favorite podcasts last week, Unsupervised Learning by Redpoint Ventures. They had Karol Hausman and Danny Driess (Research Scientist) from Physical Intelligence. Around the 33 minute mark of the podcast they mentioned the need for a tool or infrastructure to help them understand what is in their dataset, particularly given the massive amount of multimodal, time-series data that robotics generates. They outlined what they'd want in such a tool: - Decide what data to collect - Build machinery around understanding the collected data - Understand the quality of the data collected so far - Perform quality assurance at scale - Execute language annotations correctly at scale - Determine how much more data is needed for the model - Identify the optimal strategy for data collection - Provide a bird's-eye view understanding of the entire dataset I was excited by that, cuz, well, I work at FiftyOne and we have a tool that does just that... For understanding what's in your dataset, FiftyOne lets you visually explore massive datasets interactively. When they talked about needing a "bird's-eye view," that's literally what our embedding visualizations provide - you can see your entire dataset in embedding space, revealing clusters, gaps, and outliers. The QA at scale problem? FiftyOne has built-in queries to find labeling mistakes and inconsistent patterns across millions of samples. And for data collection strategy, it shows where your dataset has gaps and where models struggle - no more training for weeks to "get a signal." So I went to Physical Intelligence's Hugging Face org and found their "aloha_pen_uncap" dataset. I parsed it into FiftyOne format to see how well our tool would work with their data. In the process, I implemented a data loader for LeRobot format datasets, which means the entire robotics community can now load their datasets in FiftyOne and get all these benefits. The loader handles the multimodal nature of robotics data, parsing camera views, robot states, and actions. What became clear when I loaded their dataset: - You can visually browse task executions and see patterns in successful vs failed attempts - Embedding visualizations shows clusters of similar robot behaviors - Quality issues like poor lighting or occlusions become immediately apparent It's all open source, and all you need to do to get started is `pip install fiftyone` to see what your data looks like in FiftyOne. The tool mentioned in the podcast already exists, and it's open source!
-
𝗙𝗿𝗼𝗺 𝗥𝗢𝗦 𝘁𝗼 𝗟𝗲𝗥𝗼𝗯𝗼𝘁: 𝗛𝗼𝘄 𝗔𝗿𝗲 𝗧𝗲𝗮𝗺𝘀 𝗛𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝗩𝗟𝗔 𝗗𝗮𝘁𝗮 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀? Most real-world robotics systems are built on pub/sub architectures like #ROS. Sensors and estimators publish asynchronously and at different rates: • Cameras at ~30 Hz • Perception at ~10 Hz • State, control, and actions all run on their own clocks This decoupled design has powered robotics for decades. Vision-Language-Action models like NVIDIA Robotics GR00T and Physical Intelligence pi0 work differently. For both training and inference, they require synchronized, tensor-based data with aligned observations, states, and actions on a shared timeline. Hugging Face's #LeRobot has emerged as the community standard for representing this kind of training data. It is PyTorch-native, well documented, and increasingly supported across the ecosystem. The hard part is the bridge from asynchronous ROS topics to synchronized LeRobot episodes, without introducing bias or artifacts. At Roboto AI, we see a few common approaches in practice: 1) 𝗥𝗮𝘄 𝗥𝗢𝗦𝗯𝗮𝗴 𝗼𝗿 𝗠𝗖𝗔𝗣, 𝘁𝗵𝗲𝗻 𝗼𝗳𝗳𝗹𝗶𝗻𝗲 𝗰𝗼𝗻𝘃𝗲𝗿𝘀𝗶𝗼𝗻 𝘁𝗼 𝗟𝗲𝗥𝗼𝗯𝗼𝘁 ✔ Maximum data fidelity and the ability to reprocess later ✘ Timestamp handling, resampling, interpolation, and episode definition all need real care 2) 𝗢𝗻𝗹𝗶𝗻𝗲 𝘀𝘆𝗻𝗰𝗵𝗿𝗼𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝘄𝗶𝘁𝗵 𝗱𝗶𝗿𝗲𝗰𝘁 𝗟𝗲𝗥𝗼𝗯𝗼𝘁 𝘄𝗿𝗶𝘁𝗶𝗻𝗴 ✔ Training-ready data immediately ✘ Synchronization choices are locked in once data is recorded 3) 𝗛𝘆𝗯𝗿𝗶𝗱 𝗰𝗮𝗽𝘁𝘂𝗿𝗲 𝘂𝘀𝗶𝗻𝗴 𝗿𝗮𝘄 𝗯𝗮𝗴𝘀 𝗽𝗹𝘂𝘀 𝗮 𝘀𝘆𝗻𝗰𝗵𝗿𝗼𝗻𝗶𝘇𝗲𝗱 𝗱𝗮𝘁𝗮𝘀𝗲𝘁 ✔ Fast iteration with reproducibility ✘ Higher storage costs and more operational complexity 4) 𝗖𝘂𝘀𝘁𝗼𝗺, 𝗻𝗼𝗻-𝗥𝗢𝗦 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 ✔ Full control over data primitives ✘ You end up re-implementing large parts of the robotics stack The most common failure mode we see is train-inference skew between offline preprocessing and live data flow. This problem exists across ML, but it becomes especially critical when observations map directly to robot actions. Typical causes include: • Different resampling or alignment logic • Implicit lookahead during offline conversion • Episode boundaries that do not match deployment The result is strong offline metrics and disappointing real-world behavior. Despite the push toward end-to-end learning, most production robots will continue to rely on ROS-style pub/sub systems for the foreseeable future. That makes reproducible and auditable data curation the key link between robotics stacks and VLA training. At Roboto, we are actively building tooling to go from raw robotics data to ML-ready datasets. If you are working on VLA pipelines and have wrestled with this gap, I would love to compare notes.
-
7 lessons from AirSim: I ran the autonomous systems and robotics research effort at Microsoft for nearly a decade and here are my biggest learnings. Complete blog: https://sca.fo/AAeoC 1. The “PyTorch moment” for robotics needs to come before the “ChatGPT moment”. While there is anticipation towards Foundation Models for robots, scarcity of technical folks well versed in both deep ML and robotics, and a lack of resources for rapid iterations present significant barriers. We need more experts to work on robot and physical intelligence. 2. Most AI workloads on robots can primarily be solved by deep learning. Building robot intelligence requires simultaneously solving a multitude of AI problems, such as perception, state estimation, mapping, planning, control, etc. We are increasingly seeing successes of deep ML across the entire robotics stack. 3. Existing robotic tools are suboptimal for deep ML. Most of the tools originated before the advent of deep ML and cloud and were not designed to address AI. Legacy tools are hard to parallelize on GPU clusters. Infrastructure that is data first, parallelizable, and integrates cloud deeply throughout the robot’s lifecycle is a must. 4. Robotic foundation mosaics + agentic architectures are more likely to deliver than monolithic robot foundation models. The ability to program robots efficiently is one of the most requested use cases and a research area in itself. It currently takes a technical team weeks to program robot behavior. It is clear that foundation mosaics and agentic architecture can deliver huge value now. 5. Cloud + connectivity trumps compute on edge – Yes, even for robotics! Most operator-based robot enterprises either discard or minimally catalog the data due to a lack of data management pipelines and connectivity. Given that robotics is truly a multitasking domain – a robot needs to solve for multiple tasks at once. Connection to the cloud for data management, model refinement, and the ability to make several inference calls simultaneously would be a game changer. 6. Current approaches to robot AI Safety are inadequate Safety research for robotics is at an interesting crossroads. Neurosymbolic representation and analysis is likely an important technique that will enable the application of safety frameworks to robotics. 7. Open source can add to the overhead As a strong advocate for open-source, much of my work has been shared. While open-source offers many benefits, there are a few challenges, especially for robotics, that are less frequently discussed: Robotics is a fragmented and siloed field, and likely initially there will be more users than contributors. Within large orgs, the scope of open-source initiatives may also face limits. AirSim pushed the boundaries of the technology and provided a deep insight into R&D processes. The future of robotics will be built on the principle of being open. Stay tuned as we continue to build @Scafoai
-
A central challenge in #physical #AI is data scarcity: vision-language-action (#VLA) models are fundamentally limited by the availability of high-quality robotics demonstrations. In our recent work, we introduce R&B-EnCoRe (https://lnkd.in/gSQJp6dV), a framework that enables models to self-bootstrap embodied #reasoning by leveraging synthetic visuo-textual data together with limited embodiment-specific experience. In essence, R&B-EnCoRe allows models to learn how to reason in an embodied setting. Our approach treats reasoning as a latent variable and uses self-supervised refinement to learn reasoning strategies that are directly predictive of successful control—without human annotations, reward engineering, or external verifiers. We validate the approach across a range of embodiments—including manipulation, navigation, and autonomous driving—and across model scales from 1B to 30B parameters, observing consistent improvements: 💪 +28% task success in real-world manipulation 🦿 +101% score in legged locomotion navigation 🚗 −21% collision rate in autonomous driving Overall, this work highlights a promising direction: aligning internet-scale priors with embodiment-specific data to enable scalable, self-improving physical intelligence. Kudos to an amazing team: Milan Ganai Katie Luo Jonas Frey Clark Barrett 🌐 Website: https://lnkd.in/gs3abvbW 📄 Paper: https://lnkd.in/gSQJp6dV
-
🚀 RoboCade: Gamifying Robot Data Collection is out on arXiv — and I’m thrilled to share this collaborative work with the community! One of the biggest bottlenecks in robotics today is scaling human demonstration data for imitation learning. Traditional collection is costly, tedious, and limited to experts with access to hardware. So we asked: 👉 Can we make robot data collection accessible, engaging, and scalable — even for non-experts? That’s where RoboCade comes in: 🎮 A gamified remote teleoperation platform that transforms robot demo collection into an interactive game-like experience. 👥 Designed to engage general users — with visual feedback, progress bars, badges, leaderboards, and more — while still generating useful data for downstream policy training. Key results: ✔️ Remote players collected data that, when co-trained with traditional demos, boosted policy success on real tasks (+16 – 56%). ✔️ In user studies, beginners found RoboCade significantly more enjoyable and motivating than standard interfaces (+24%). ✔️ We also propose principles for gamified task design so the collected data actually helps with real manipulation challenges. Why this matters: 🔹 Broadening participation in robotics research beyond labs and experts 🔹 Intrinsic motivation rather than paying for data labeling 🔹 A scalable crowd-sourced pipeline for future robot learning systems Huge thanks to Suvir Mirchandani, Mia Tang, Jubayer Ibn Hamid, Michael Cho, and Dorsa Sadigh for the collaboration. 🔧🤝 Read the full paper on arXiv — and check out our demo videos at https://lnkd.in/gjyE6A5S #Robotics #ImitationLearning #HumanAI #Crowdsourcing #Gamification #MachineLearning
-
How to start applying active learning methods in R&D? The traditional way to apply ML is to do so one task at a time - e.g. optimization of single process step, segmentation of images, and so on. Often this is driven by combination of opportunity and curiosity - always a good first step. However, adopting ML on project level is more complex. After being involved in multiple active learning projects in imaging, materials synthesis via lab robotics and combinatorial libraries, I have formulated a set of questions that can help make selection. 1. What is the objective of the project, i.e. what is the overall goal. Examples: a. I want to make better materials for engines b. I want to make stable PV cell with 30% efficiency c. I want to understand physics of polarization switching in hafnia 2. What is your experimental workflow. Examples: a. I explore compositional space of specific metallic system via combinatorial spread libraries b. I use pipetting robot to synthesize hybrid perovskite nanocrystals c. I do Piezoresponse Force Microscopy experiments. 3. What is the specific reward, meaning how do we know that specific experiment is successful. Examples: a. I do nanoindentation testing b. I measure time dependence of the photoluminescence c. I analyze domain size as a function of bias and time of pulse Note that connection between objective and reward can be more or less strong (and not amenable to ML!), but experimental reward is measurable in the end or during the experiment. ML can help increase reward. 4.What is the dimensionality of your parameter space. Examples: a. For combinatorial library, it is composition vector. However, I also want to anneal them, so it is annealing temperature, so total is c x T. Processing can be a trajectory, but we do not want to go there b. It is compositions, choice of antisolvent, and annealing temperature c. It is bias and time, but switching will depend on local microstructure as well. 5. Is the reward available at each step of experiment, or only in the end. If the former, we can do Bayesian optimization methods. If not, it is reinforcement learning problem – and these typically require a lot of data 6. Do you have proxy data. Examples: a. For combi library, we have compositions from EDS b. No (but combi libraries themselves can be proxy to films0 c. Yes – domain structure 7. Do we have some physical model, even as simple as “for low bias there is no switching, for high bias there is, and function is monotonic”. This makes the case for structured GP models 8. Do we have computational theory, and is it cheap or expensive? If cheap theory is available, we can do multifidelity methods. If not, then it is a co-navigation problem. 9. How close do we expect theory to be to the experiment? With these, we should be able to identify which active learning problems are likely to be solvable and which are not given budgets and complexity of the problem.