World model research has three bottlenecks the authors name directly: fragile one-off codebases, slow video data loading, no standardized generalization benchmarks. Every paper reimplements the same plumbing. Comparing two methods fairly can take weeks of infrastructure work before you learn anything.
𝘀𝘁𝗮𝗯𝗹𝗲-𝘄𝗼𝗿𝗹𝗱𝗺𝗼𝗱𝗲𝗹 is an open-source platform that standardizes the whole pipeline:
→ A Lance-based data layer with native conversion for MP4, HDF5, and LeRobot datasets
→ Clean reference implementations of DINO-WM, LeWorldModel, PLDM, TD-MPC2, plus CEM / MPPI / gradient-based planners for MPC
→ ~150 environments with controllable visual, geometric, and physical factors — one comparable zero-shot generalization number out across dynamics, control, representation quality, and OOD
The data layer is built on Lance. World-model training is small-batch random access over a sequence buffer. Lance streams that directly from object storage, several times faster than HDF5 or traditional video streaming formats on local disk, with the gap widening sharply over the network. That makes training directly from S3 (no local sync step) practical on ephemeral GPU boxes.
Shoutout to Ayush Chaurasia, Lucas Maes, Quentin Le Lidec, Randall Balestriero, & Yann LeCun for building this 🚀
Paper: https://lnkd.in/gni9WK_x
Code: https://lnkd.in/gGxjt5KU