SQLite is the most widely deployed database in the world, running on devices as ubiquitous as phones and TVs and as mission-critical as aircraft software. But how did it get there, and where is it going next? I love this paper because it gives both a historical perspective on SQLite and an honest look at its advantages and shortcomings. SQLite originated 25 years ago as a small package of data management functions for Tcl, but rapidly evolved into a full-fledged transactional database. It became enormously popular for three big reasons: - It’s embedded and self-contained. Uniquely among major databases, SQLite runs inside your application's process instead of on a separate server. It’s distributed as a single C file that compiles to less than 750 KiB. This lets SQLite run in resource-constrained environments where a conventional database would be unworkable. - It’s cross-platform. A SQLite database is stored in a single file that can be freely copied across almost any machine, regardless of architecture. In embedded systems where custom architectures are everywhere, this is a huge advantage. - It’s reliable. SQLite has a stunning 600 lines of test code for every line of SQLite code, covering all sorts of rare crashes and failure conditions. This is huge for mission-critical systems. What’s interesting about SQLite is that while it was designed for transactional workloads, it’s increasingly used for analytical (OLAP) workloads just because it’s lightweight and easy to add to a data science library. Smart database researchers saw this trend and built an embedded analytics-focused database, DuckDB , which has also become very popular. A big chunk of the paper is devoted to a performance comparison of the two, and the differences are striking: - On transactional workloads containing mostly small reads and writes, SQLite is 10-60x faster than DuckDB. - On analytical workloads consisting of large scans, aggregations, and joins, DuckDB is 30-50x faster than SQLite. This shows how important it is to pick the right system for your workload! Interestingly, the authors propose some optimizations for SQLite based on ideas from DuckDB, like using Bloom filters to speed up big joins. These make SQLite up to 4x faster on some analytical queries, though it’s still an order of magnitude behind DuckDB. Hopefully cross-pollination between databases continues to improve them all in the future!
Embedded Database Systems
Explore top LinkedIn content from expert professionals.
Summary
Embedded database systems are lightweight databases that run within applications, rather than on separate servers, making them ideal for resource-constrained environments like mobile devices and IoT gadgets. These systems, such as SQLite and DuckDB, offer simplicity, speed, and reliability for managing data locally, whether for transactional operations or analytical tasks.
- Choose purposefully: Select your embedded database based on your workload needs; SQLite excels at fast, small transactions, while DuckDB is built for large analytical queries.
- Simplify integration: Avoid complex infrastructure by using embedded databases, which require minimal setup and can be deployed easily across devices and platforms.
- Prioritize reliability: Look for embedded databases with strong testing and cross-platform compatibility to minimize data corruption and ensure stable performance, even in mission-critical systems.
-
-
A lot of people ask me why I use SQLite as a VectorDB and how I’m doing it. This post’s for you. I’ve been using SQLite to power my recent reasoning and vector-based self-learning systems, like ReasoningBank and collapsible memory. Each agent starts its own SQL database in milliseconds, either fully in memory for speed or on disk for persistence using WAL mode. It’s perfect for smaller operations where you want high performance with almost no overhead. I store embeddings as compact f32 blobs with precomputed norms, allowing cosine similarity to run directly in SQL through small Rust-based functions. It’s simple, fast, and self-contained. This setup lets agents manage their own context efficiently. They can recall past runs, measure reasoning shifts, and summarize older data into compressed memory graphs. Using SQLite pragmas for mmap and cache optimization, retrieval happens in microseconds. It behaves like a miniature vector store embedded inside each agent or swarm. For most agentic systems, this approach is ideal. You don’t need a massive distributed vector platform unless you’re handling millions of embeddings or need global retrieval across agents. Heavy tools like Pinecone or Milvus have their place, but for local reasoning, reflection, and agent-specific learning, SQLite does the job elegantly. It’s fast, simple, and scales in all the ways that matter.
-
Everyone’s chasing scalability like they’re about to hit Facebook-level traffic next week. You’re not. But you are over-engineering. SQLite is often dismissed as "just an embedded database," but for AI-driven apps, prototypes, and even production workloads, it’s a powerhouse: ✅ Blazing fast – Zero network latency, reads straight from disk. ✅ Simple & lightweight – No server, no ops, no drama. ✅ Handles more than you think – Can manage terabytes with proper tuning. When building AI tools, especially LLM-powered apps, most queries are read-heavy—perfect for SQLite. Instead of prematurely setting up PostgreSQL or DynamoDB, ask yourself: 💡 Are you building for users, or imaginary hyperscale? Unless you’re operating at Amazon or Google scale (spoiler: you’re not), SQLite is probably all you need. Embrace practical scalability. Overkill infra kills velocity.
-
DuckDB Paper. Everyone loves DuckDB - the simplicity, robust tech, effectiveness! DuckDB was not an attempt to reinvent the database from scratch. It’s the result of standing on decades of research and open-source innovation. Instead of reinventing the wheel, DuckDB brings together proven techniques from academic literature and production systems to create a modern, embedded analytical database. Here are some of the key technologies and ideas that inspired its architecture. ✅ Execution Engine: Inspired by MonetDB/X100’s vectorized processing model for efficient CPU utilization. ✅ Optimizer: Uses join ordering and subquery flattening techniques from TUM’s research ✅ Concurrency Control: Implements HyPer-style Serializable MVCC for high-performance OLAP/OLTP concurrency. ✅ Secondary Indexes: Based on Adaptive Radix Trees (ART) for fast, memory-efficient indexing. ✅ Window Functions: Implements Segment Tree Aggregation as described in TUM’s analytical SQL research. ✅ Inequality Joins: Uses the IEJoin algorithm for fast processing of non-equi joins. ✅ Floating-Point Compression: Supports Chimp, Patas, and ALP algorithms for efficient numeric storage. ✅ SQL Parser: Derived from PostgreSQL’s parser via libpg_query, adapted for in-process use. ✅ Shell: Reuses the lightweight and familiar SQLite shell for command-line interaction. While the general notion is that a new technology comes with “newer, shiny” stuff, it is important to realize that we rely on solid fundamentals! The use cases and applications changes and how we make a technology work around those use cases is key. #Lakehouse architecture for example is pretty much unbundling of a typical database system. We wanted to build a database system for data lakes. If you are interested in details, Check out the paper. #dataengineering #softwareengineering