LanceDB’s cover photo
LanceDB

LanceDB

Information Services

San Francisco, California 11,432 followers

Developer-friendly, open source database for multi-modal AI

About us

LanceDB is a developer-friendly, open source database for multimodal AI. From hyper scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application.

Website
http://lancedb.com
Industry
Information Services
Company size
11-50 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2022

Locations

Employees at LanceDB

Updates

  • LanceDB reposted this

    View profile for Yann LeCun
    Yann LeCun Yann LeCun is an Influencer

    stable-worldmodel: a general harness to test world-model-based planning.

    View profile for Lucas Maes

    Ph.D Student JEPA @ Mila

    Would you like to join the research effort on JEPA and World Models easily? After a full year of hard work, we’re excited to finally release stable-worldmodel: an open-source, scalable platform built to accelerate JEPA & World Model research! Github: https://lnkd.in/eBhWxn8i X Post: https://lnkd.in/eNvCYh8t Work done with the great support and contributions of LanceDB, Quentin Le Lidec, Luiz Facury, Nassim Massaudi, Ayush Chaurasia, Francesco Capuano, Richard Gao, Taj Gillin, Dan Haramati, Damien Scieur, Yann LeCun, Randall Balestriero

    • No alternative text description for this image
  • LanceDB reposted this

    World model research, like any other hot research topic, is pretty fragmented with each paper implementing its own data collection, curation and loading pipelines, evals, baselines and training loops. stable-worldmodel is an oss platform that standardises this process by implementing high performance data pipeline using LanceDB, architectures of popular world models with a shared training code which allows apples-to-apples comparison and makes overall speed to iteration significantly faster. This was fun collab with Lucas Maes and Quentin Le Lidec Check out the repo - https://lnkd.in/gQ5GBy5f

    View organization page for LanceDB

    11,432 followers

    World model research has three bottlenecks the authors name directly: fragile one-off codebases, slow video data loading, no standardized generalization benchmarks. Every paper reimplements the same plumbing. Comparing two methods fairly can take weeks of infrastructure work before you learn anything. 𝘀𝘁𝗮𝗯𝗹𝗲-𝘄𝗼𝗿𝗹𝗱𝗺𝗼𝗱𝗲𝗹 is an open-source platform that standardizes the whole pipeline: → A Lance-based data layer with native conversion for MP4, HDF5, and LeRobot datasets → Clean reference implementations of DINO-WM, LeWorldModel, PLDM, TD-MPC2, plus CEM / MPPI / gradient-based planners for MPC → ~150 environments with controllable visual, geometric, and physical factors — one comparable zero-shot generalization number out across dynamics, control, representation quality, and OOD The data layer is built on Lance. World-model training is small-batch random access over a sequence buffer. Lance streams that directly from object storage, several times faster than HDF5 or traditional video streaming formats on local disk, with the gap widening sharply over the network. That makes training directly from S3 (no local sync step) practical on ephemeral GPU boxes. Shoutout to Ayush Chaurasia, Lucas Maes, Quentin Le Lidec, Randall Balestriero, & Yann LeCun for building this 🚀 Paper: https://lnkd.in/gni9WK_x Code: https://lnkd.in/gGxjt5KU

    • No alternative text description for this image
  • LanceDB reposted this

    Would you like to join the research effort on JEPA and World Models easily? After a full year of hard work, we’re excited to finally release stable-worldmodel: an open-source, scalable platform built to accelerate JEPA & World Model research! Github: https://lnkd.in/eBhWxn8i X Post: https://lnkd.in/eNvCYh8t Work done with the great support and contributions of LanceDB, Quentin Le Lidec, Luiz Facury, Nassim Massaudi, Ayush Chaurasia, Francesco Capuano, Richard Gao, Taj Gillin, Dan Haramati, Damien Scieur, Yann LeCun, Randall Balestriero

    • No alternative text description for this image
  • View organization page for LanceDB

    11,432 followers

    World model research has three bottlenecks the authors name directly: fragile one-off codebases, slow video data loading, no standardized generalization benchmarks. Every paper reimplements the same plumbing. Comparing two methods fairly can take weeks of infrastructure work before you learn anything. 𝘀𝘁𝗮𝗯𝗹𝗲-𝘄𝗼𝗿𝗹𝗱𝗺𝗼𝗱𝗲𝗹 is an open-source platform that standardizes the whole pipeline: → A Lance-based data layer with native conversion for MP4, HDF5, and LeRobot datasets → Clean reference implementations of DINO-WM, LeWorldModel, PLDM, TD-MPC2, plus CEM / MPPI / gradient-based planners for MPC → ~150 environments with controllable visual, geometric, and physical factors — one comparable zero-shot generalization number out across dynamics, control, representation quality, and OOD The data layer is built on Lance. World-model training is small-batch random access over a sequence buffer. Lance streams that directly from object storage, several times faster than HDF5 or traditional video streaming formats on local disk, with the gap widening sharply over the network. That makes training directly from S3 (no local sync step) practical on ephemeral GPU boxes. Shoutout to Ayush Chaurasia, Lucas Maes, Quentin Le Lidec, Randall Balestriero, & Yann LeCun for building this 🚀 Paper: https://lnkd.in/gni9WK_x Code: https://lnkd.in/gGxjt5KU

    • No alternative text description for this image
  • One SELECT. Ranked retrieval. The raw JPEG bytes for each result, inline. Multimodal retrieval usually means three systems: object store for bytes, vector DB for embeddings, SQL warehouse for metadata. The Lance core extension in DuckDB collapses all three. Image bytes live as a BLOB column alongside embeddings and metadata — lance_vector_search() is a SQL table function, the rest is standard SQL. Make your SQL workflows multimodal: https://lnkd.in/duC9Nh-R

    • No alternative text description for this image
  • LanceDB is part of the Microsoft for Startups spotlight at Microsoft Build 2026. We're building the multimodal lakehouse — one table for raw bytes, embeddings, metadata, and features that serves exploration, feature engineering, curation, and training without crossing systems. The teams we work with (Netflix, Runway, Midjourney) stopped stitching together five systems and started iterating on models instead. We're on Azure and available on Microsoft Marketplace. If you're in San Francisco for Build, come find us – Amid Tabrizi & Jonathan M Hsieh will be there! Stop by our session "Designing AI Data Platforms on Azure Blob Beyond Retrieval" on Wed Jun 3 at 11:15am: https://lnkd.in/g96dMryg 🔗 Featured startups at Microsoft Build 2026: https://lnkd.in/gT7n_pMv

    • No alternative text description for this image
  • View organization page for LanceDB

    11,432 followers

    DuckDB 🤝 LanceDB — the results from the benchmarks show why AI-native data needs an AI-native data layer. The Lance-DuckDB extension lets teams query Lance datasets directly from DuckDB SQL: read/write datasets, attach namespaces, manage indexes, and run vector, full-text, and hybrid search. A few Lance highlights from a LAION image/text benchmark with CLIP embeddings, raw image bytes, captions, and metadata: • 69K+ multimodal rows materialized locally  • 12 ms cold indexed vector search  • 5 ms warm vector search  • 17 ms cold hybrid search • 8 ms warm hybrid search  • Native support for embeddings, text, images, blobs, metadata, updates, schema evolution, and versioned datasets. DuckDB is a great SQL surface. Lance is the open AI-native storage layer where evolving multimodal datasets live. Read DuckDB’s full benchmark post: https://lnkd.in/gz9d3xmN

    • No alternative text description for this image
  • View organization page for LanceDB

    11,432 followers

    Three talks in one room TOMORROW NIGHT in Menlo Park: ingestion, retrieval, and metadata lineage — the parts of the AI stack that most teams are still stitching together manually. ChanChan M. will walk through Lance, the default storage layer for multimodal AI. One table for raw bytes, embeddings, and features, without the export pipelines and stale snapshots that come with a separate vector DB and data lake. Joining us: elvis kahoro from dltHub on the connective tissue of your AI data stack, and Gabe Lyons & Manuela Wei from DataHub on how trusted context makes Cortex agents more reliable in production. Doors open at 6pm, see you there! 📅 Thur May 21 6PM, SVAI @ Menlo Park 🔗 Register: https://lnkd.in/gErcViP9

  • LanceDB reposted this

    View organization page for dltHub

    13,211 followers

    The AI stack is evolving fast, but reliable data movement is still the foundation. As AI systems become more complex, connecting ingestion, retrieval, and metadata layers reliably is becoming a core engineering challenge, and that’s what we’ll be discussing live in Menlo Park. We're joining forces with LanceDB and DataHub for a Community Meetup on May 21 at Silicon Valley AI Hub, and the lineup is built for engineers in the trenches: - Lance as the default storage layer for multimodal AI, LanceDB - The connective tissue of your AI data stack, dltHub - Lineage you can trust: Supercharging Cortex with DataHub No fluff, just the engineers actually building this stuff, showing you how it works in production. 📅 Thursday, May 21 · 6:00 PM - 8:00 PM PDT · Menlo Park, CA 🍕 Talks, demos, networking, food & drinks 🔗 Join us here: https://luma.com/80pocni3

  • With LeRobot v3's default layout, exploring a dataset before training means pulling more data than you need. 𝗹𝗲𝗿𝗼𝗯𝗼𝘁-𝗹𝗮𝗻𝗰𝗲𝗱𝗯 changes that. Open any Lance-formatted dataset from the Hub via hf:// and filter by episode_index, frame_index, or task metadata before touching a video blob. From there, materialize your slice, attach embeddings, add columns, and pass it to LeRobotLanceDataset — a drop-in for LeRobotDataset, existing PyTorch code unchanged. This gives robotics teams fast random access, lazy multimodal blob reads, and a cleaner path from dataset curation to training. Filter, curate, then train — all from one table: https://lnkd.in/grSCkU6P

Similar pages

Browse jobs

Funding

LanceDB 3 total rounds

Last Round

Series A

US$ 30.0M

See more info on crunchbase