How rqlite is tested

rqlite is a lightweight, open-source, distributed relational database written in Go, and built on SQLite and Raft. With its origins dating back to 2014, its design has always prioritized reliability, and quality. The robustness of rqlite is also a testament to its disciplined testing strategy: after more than 10 years of development and deployments, users have reported fewer than 10 instances of panics in production.

Testing a distributed system like rqlite is no small feat. It requires careful consideration of various layers: from individual components to the entire system in operation. Let’s explore how rqlite is tested, following its philosophy of maintaining quality without unnecessary complexity.

The Testing Pyramid: An Effective Approach

Testing rqlite adheres to the well-known testing pyramid, which prioritizes unit tests as the foundation, supported by integration tests, and capped with minimal end-to-end (E2E) tests. This strategy reflects decades of software development experience, ensuring test suites remain efficient, targeted, and easy to debug — and in my experience this approach works.

Unit Testing: The Core of Quality

At the base of the pyramid lies unit testing, covering isolated components. Unit testing dominates rqlite’s test suite because it offers the best balance of speed and precision. Given that rqlite’s database layer is built around SQLite and a “shared nothing” architecture, most database-related functionality can be reliably tested with unit tests.

Testing is also a huge part of the design process. If a component cannot be unit-tested easily, it often signals issues with its design. A little dependency injection during testing is a good thing, but too much indicates an over-reliance on other components. Meeting the goal of easy unit testing means clean interfaces, helping components remain focused on a single task.

Let’s look at the numbers. As of version 8.34.0, the entire rqlite code base is approximately 75,000 lines long (including tests, but excluding imported packages). Of that rqlite’s unit test suite comprises 27,000 lines of source code, making it the largest testing investment. Despite its breadth, the entire suite runs in just a few minutes, enabling frequent testing during development.

System-Level Testing: Validating Consensus

Above unit testing lies system-level testing (also known as integration testing), which focuses on the interplay between the Raft consensus module and SQLite. Since Distributed Consensus is at the core of rqlite, the correctness of this layer is crucial. Tests in this category validate:

Replication of SQLite statements across nodes.
Behavior of read operations at different consistency levels.
Resilience during cluster disruptions, such as node failures and subsequent recoveries, as well as Leader elections.

System tests include both single-node and multi-node configurations, ensuring the database operates correctly under varying cluster conditions. As of version 8.34.0, approximate 7000 lines of system-level tests exist, offering comprehensive coverage of these interactions. This test suite is also written in Go, which means it also runs relatively quickly.

End-to-End Testing: A Minimal Layer

End-to-end testing in rqlite serves as a smoke check, verifying that the system starts, clusters, and performs basic operations. Written in Python, these tests launch real rqlite clusters to ensure “happy path” functionality, guarding against embarrassing issues like a cluster failing to start due to a bug in command-line flag parsing.

End-to-end tests are deliberately limited to scenarios that cannot be tested at lower levels. Over-reliance on end-to-end testing is avoided because debugging failures in such tests can become prohibitively costly. For instance, a misconfigured dependency deep in the stack might surface in an end-to-end test, but tracing the root cause would require navigating through numerous layers.

A practical example of end-to-end testing is verifying backups to S3. End-to-end testing is useful here because setting up AWS credentials solely for unit testing would be cumbersome and, perhaps, impractical for other developers who wish to run the unit tests. While this approach does mean that S3-related development for rqlite is slower compared to other features, the trade-off is justified. The backup system rarely undergoes changes, so the added complexity of end-to-end testing is worth the effort to ensure reliability.

For version 8.340, only 5000 lines of end-to-end tests exist, demonstrating a targeted approach.

Performance Testing: Pushing the Limits

Beyond functional correctness, rqlite undergoes performance testing to evaluate its limits under load. These tests measure metrics such as:

Maximum INSERT rates.
Handling of concurrent queries.
Comparing memory, CPU, and disk usage across releases.

A notable example involves testing with large SQLite databases, sometimes exceeding 2GB. Such scenarios highlight bottlenecks like rqlite’s memory management or disk write latencies, which are intrinsic to its architecture. Generating such large datasets efficiently remains an ongoing challenge, with potential solutions involving prebuilt SQLite databases stored in cloud buckets.

Performance testing also ensures stability, identifying issues like memory leaks or unexpected Leader elections under stress.

Lessons Learned

Testing rqlite has taught me valuable lessons, many of which resonate beyond database development:

Start testing at the start: Unit testing is the most effective way to build confidence in your system. Don’t delay writing unit tests during development. If a bug exists, you’ll likely find it faster here than in an integration or end-to-end test.
Keep test code simple. Test suites are not the place for relentless refactoring or the DRY mindset. It’s more important that test code is straightforward and easy to understand, even if that means writing more boilerplate than you otherwise would.
Check your tests. When writing a test, it’s a good practice to temporarily invert the expected result and run the test again. A properly written test should fail in this scenario. Surprisingly this isn’t always the case, as errors in test code can sometimes go unnoticed. To avoid this, always take a moment to sanity-check your tests. It’s a small step that ensures your tests are reliable and truly doing their job.
Don’t ignore test failures. Any test failure, no matter how difficult to understand, no matter how rare, is telling you something about your software — potentially something you don’t understand. Those hard-to-debug test cases often reveal a critical flaw in your code. Treat them as a gift and fix them.
Maximize determinism. Build mechanisms into your system so you can trigger, on demand, what are normally automatic processes in your system. This allows you to test how your system performs when those operations occur. This approach is used in rqlite to test Raft snapshotting, which normally runs at semi-random intervals but can be explicitly triggered as needed during testing.
Be Deliberate: Adding tests at higher levels must be justified. Excessive integration or end-to-end tests can quickly bog down development and debugging.
Adapt and Iterate: For example, performance tests revealed that fsync calls were the primary bottleneck, leading to further optimizations in disk usage – such as compressing Raft log entries before writing them to disk.
Efficiency Matters: With a suite that runs in a matter of minutes, I can iterate rapidly with confidence, a crucial advantage in maintaining an active open-source project.

Quality Matters

By adhering to the testing pyramid and focusing on targeted, efficient tests, rqlite maintains high quality while minimizing overhead. Whether through unit tests for component reliability, system tests for distributed consensus, or end-to-end tests for sanity checks, every layer serves a purpose.

As rqlite continues to evolve, so will its testing practices. With distributed systems becoming increasingly complex, maintaining simplicity in testing will remain a cornerstone of its design philosophy. After all, the goal is not just to build a database but to build one that works reliably, and is easy to operate, in the real world.