LanceDB

Information Services

San Francisco, California 11,432 followers

Developer-friendly, open source database for multi-modal AI

See jobs Follow

Discover all 48 employees

About us

LanceDB is a developer-friendly, open source database for multimodal AI. From hyper scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application.

Website: http://lancedb.com
External link for LanceDB
Industry: Information Services
Company size: 11-50 employees
Headquarters: San Francisco, California
Type: Privately Held
Founded: 2022

Locations

Primary

San Francisco, California, US

Get directions

Employees at LanceDB

See all employees

Updates

LanceDB reposted this
Yann LeCun Yann LeCun is an Influencer
3d
Report this post
stable-worldmodel: a general harness to test world-model-based planning.
Lucas Maes

Ph.D Student JEPA @ Mila
3d

Would you like to join the research effort on JEPA and World Models easily? After a full year of hard work, we’re excited to finally release stable-worldmodel: an open-source, scalable platform built to accelerate JEPA & World Model research! Github: https://lnkd.in/eBhWxn8i X Post: https://lnkd.in/eNvCYh8t Work done with the great support and contributions of LanceDB, Quentin Le Lidec, Luiz Facury, Nassim Massaudi, Ayush Chaurasia, Francesco Capuano, Richard Gao, Taj Gillin, Dan Haramati, Damien Scieur, Yann LeCun, Randall Balestriero
49 Comments

Like Comment Share
LanceDB reposted this
Ayush Chaurasia
4d Edited
Report this post
World model research, like any other hot research topic, is pretty fragmented with each paper implementing its own data collection, curation and loading pipelines, evals, baselines and training loops. stable-worldmodel is an oss platform that standardises this process by implementing high performance data pipeline using LanceDB, architectures of popular world models with a shared training code which allows apples-to-apples comparison and makes overall speed to iteration significantly faster. This was fun collab with Lucas Maes and Quentin Le Lidec Check out the repo - https://lnkd.in/gQ5GBy5f
LanceDB

11,432 followers
4d Edited

World model research has three bottlenecks the authors name directly: fragile one-off codebases, slow video data loading, no standardized generalization benchmarks. Every paper reimplements the same plumbing. Comparing two methods fairly can take weeks of infrastructure work before you learn anything. 𝘀𝘁𝗮𝗯𝗹𝗲-𝘄𝗼𝗿𝗹𝗱𝗺𝗼𝗱𝗲𝗹 is an open-source platform that standardizes the whole pipeline: → A Lance-based data layer with native conversion for MP4, HDF5, and LeRobot datasets → Clean reference implementations of DINO-WM, LeWorldModel, PLDM, TD-MPC2, plus CEM / MPPI / gradient-based planners for MPC → ~150 environments with controllable visual, geometric, and physical factors — one comparable zero-shot generalization number out across dynamics, control, representation quality, and OOD The data layer is built on Lance. World-model training is small-batch random access over a sequence buffer. Lance streams that directly from object storage, several times faster than HDF5 or traditional video streaming formats on local disk, with the gap widening sharply over the network. That makes training directly from S3 (no local sync step) practical on ephemeral GPU boxes. Shoutout to Ayush Chaurasia, Lucas Maes, Quentin Le Lidec, Randall Balestriero, & Yann LeCun for building this 🚀 Paper: https://lnkd.in/gni9WK_x Code: https://lnkd.in/gGxjt5KU
9 Comments

Like Comment Share
LanceDB reposted this
Lucas Maes
3d
Report this post
Would you like to join the research effort on JEPA and World Models easily? After a full year of hard work, we’re excited to finally release stable-worldmodel: an open-source, scalable platform built to accelerate JEPA & World Model research! Github: https://lnkd.in/eBhWxn8i X Post: https://lnkd.in/eNvCYh8t Work done with the great support and contributions of LanceDB, Quentin Le Lidec, Luiz Facury, Nassim Massaudi, Ayush Chaurasia, Francesco Capuano, Richard Gao, Taj Gillin, Dan Haramati, Damien Scieur, Yann LeCun, Randall Balestriero
51 Comments

Like Comment Share
LanceDB

11,432 followers
4d Edited
Report this post
World model research has three bottlenecks the authors name directly: fragile one-off codebases, slow video data loading, no standardized generalization benchmarks. Every paper reimplements the same plumbing. Comparing two methods fairly can take weeks of infrastructure work before you learn anything. 𝘀𝘁𝗮𝗯𝗹𝗲-𝘄𝗼𝗿𝗹𝗱𝗺𝗼𝗱𝗲𝗹 is an open-source platform that standardizes the whole pipeline: → A Lance-based data layer with native conversion for MP4, HDF5, and LeRobot datasets → Clean reference implementations of DINO-WM, LeWorldModel, PLDM, TD-MPC2, plus CEM / MPPI / gradient-based planners for MPC → ~150 environments with controllable visual, geometric, and physical factors — one comparable zero-shot generalization number out across dynamics, control, representation quality, and OOD The data layer is built on Lance. World-model training is small-batch random access over a sequence buffer. Lance streams that directly from object storage, several times faster than HDF5 or traditional video streaming formats on local disk, with the gap widening sharply over the network. That makes training directly from S3 (no local sync step) practical on ephemeral GPU boxes. Shoutout to Ayush Chaurasia, Lucas Maes, Quentin Le Lidec, Randall Balestriero, & Yann LeCun for building this 🚀 Paper: https://lnkd.in/gni9WK_x Code: https://lnkd.in/gGxjt5KU
Like Comment Share
LanceDB

11,432 followers
5d
Report this post
One SELECT. Ranked retrieval. The raw JPEG bytes for each result, inline. Multimodal retrieval usually means three systems: object store for bytes, vector DB for embeddings, SQL warehouse for metadata. The Lance core extension in DuckDB collapses all three. Image bytes live as a BLOB column alongside embeddings and metadata — lance_vector_search() is a SQL table function, the rest is standard SQL. Make your SQL workflows multimodal: https://lnkd.in/duC9Nh-R
Like Comment Share
LanceDB

11,432 followers
6d
Report this post
LanceDB is part of the Microsoft for Startups spotlight at Microsoft Build 2026. We're building the multimodal lakehouse — one table for raw bytes, embeddings, metadata, and features that serves exploration, feature engineering, curation, and training without crossing systems. The teams we work with (Netflix, Runway, Midjourney) stopped stitching together five systems and started iterating on models instead. We're on Azure and available on Microsoft Marketplace. If you're in San Francisco for Build, come find us – Amid Tabrizi & Jonathan M Hsieh will be there! Stop by our session "Designing AI Data Platforms on Azure Blob Beyond Retrieval" on Wed Jun 3 at 11:15am: https://lnkd.in/g96dMryg 🔗 Featured startups at Microsoft Build 2026: https://lnkd.in/gT7n_pMv
Like Comment Share
LanceDB

11,432 followers
1w Edited
Report this post
DuckDB 🤝 LanceDB — the results from the benchmarks show why AI-native data needs an AI-native data layer. The Lance-DuckDB extension lets teams query Lance datasets directly from DuckDB SQL: read/write datasets, attach namespaces, manage indexes, and run vector, full-text, and hybrid search. A few Lance highlights from a LAION image/text benchmark with CLIP embeddings, raw image bytes, captions, and metadata: • 69K+ multimodal rows materialized locally • 12 ms cold indexed vector search • 5 ms warm vector search • 17 ms cold hybrid search • 8 ms warm hybrid search • Native support for embeddings, text, images, blobs, metadata, updates, schema evolution, and versioned datasets. DuckDB is a great SQL surface. Lance is the open AI-native storage layer where evolving multimodal datasets live. Read DuckDB’s full benchmark post: https://lnkd.in/gz9d3xmN
1 Comment

Like Comment Share
LanceDB

11,432 followers
1w Edited
Report this post
Three talks in one room TOMORROW NIGHT in Menlo Park: ingestion, retrieval, and metadata lineage — the parts of the AI stack that most teams are still stitching together manually. ChanChan M. will walk through Lance, the default storage layer for multimodal AI. One table for raw bytes, embeddings, and features, without the export pipelines and stale snapshots that come with a separate vector DB and data lake. Joining us: elvis kahoro from dltHub on the connective tissue of your AI data stack, and Gabe Lyons & Manuela Wei from DataHub on how trusted context makes Cortex agents more reliable in production. Doors open at 6pm, see you there! 📅 Thur May 21 6PM, SVAI @ Menlo Park 🔗 Register: https://lnkd.in/gErcViP9

The missing data layer for ML: dltHub x LanceDB x DataHub @ SVAI · Luma luma.com

2 Comments

Like Comment Share
LanceDB reposted this
dltHub

13,211 followers
2w
Report this post
The AI stack is evolving fast, but reliable data movement is still the foundation. As AI systems become more complex, connecting ingestion, retrieval, and metadata layers reliably is becoming a core engineering challenge, and that’s what we’ll be discussing live in Menlo Park. We're joining forces with LanceDB and DataHub for a Community Meetup on May 21 at Silicon Valley AI Hub, and the lineup is built for engineers in the trenches: - Lance as the default storage layer for multimodal AI, LanceDB - The connective tissue of your AI data stack, dltHub - Lineage you can trust: Supercharging Cortex with DataHub No fluff, just the engineers actually building this stuff, showing you how it works in production. 📅 Thursday, May 21 · 6:00 PM - 8:00 PM PDT · Menlo Park, CA 🍕 Talks, demos, networking, food & drinks 🔗 Join us here: https://luma.com/80pocni3

The missing data layer for ML: dltHub x LanceDB x DataHub @ SVAI · Luma luma.com

1 Comment

Like Comment Share
LanceDB

11,432 followers
1w
Report this post
With LeRobot v3's default layout, exploring a dataset before training means pulling more data than you need. 𝗹𝗲𝗿𝗼𝗯𝗼𝘁-𝗹𝗮𝗻𝗰𝗲𝗱𝗯 changes that. Open any Lance-formatted dataset from the Hub via hf:// and filter by episode_index, frame_index, or task metadata before touching a video blob. From there, materialize your slice, attach embeddings, add columns, and pass it to LeRobotLanceDataset — a drop-in for LeRobotDataset, existing PyTorch code unchanged. This gives robotics teams fast random access, lazy multimodal blob reads, and a cleaner path from dataset curation to training. Filter, curate, then train — all from one table: https://lnkd.in/grSCkU6P

LeRobotDataset - LanceDB docs.lancedb.com

Like Comment Share

Browse jobs

Funding

LanceDB 3 total rounds

Last Round

Series A Jul 24, 2025

US$ 30.0M

Investors

Theory Ventures + 6 Other investors

See more info on crunchbase

LanceDB

Information Services

San Francisco, California 11,432 followers

Developer-friendly, open source database for multi-modal AI

About us

Locations

Employees at LanceDB

David Wang

Dave Unger

Peter Ebert

Catherine Chung

Updates

Join now to see what you are missing

Similar pages

Eventual

DuckDB

Apache DataFusion

Polars

Pinecone

Qdrant

Kuzu

AtoB

Spice AI

Lightning AI

Browse jobs

Engineer jobs

Developer jobs

System Operations Engineer jobs

Staff Scientist jobs

Enterprise Account Executive jobs

Database Administrator jobs

Director of Engineering jobs

Site Reliability Engineer jobs

Engineering Manager jobs

Scientist jobs

Intern jobs

Software Engineer jobs

Senior Data Analyst jobs

Full Stack Engineer jobs

Marketing Manager jobs

Lead jobs

Legal Counsel jobs

Contract Manager jobs

Machine Learning Engineer jobs

Lawyer jobs

Funding