I am building a time series database in C++. Changes to record sets are restricted to appends. This is not a production system, rather for learning.

The motivation and purpose for a WAL (write-ahead log) in general and in my case:

  1. With regular databases, you have a data file is not guaranteed (and rarely) sequential, therefore transactions involve random disk operations, which are slow.

  2. If a client requests a transaction, and the write could be sitting in memory for a while before flushed to disk, by that time success may have been returned to the user.

  3. If success is returned to the user and the flush fails, the user is misled and data is lost, breaking Durability in the ACID principles.

  4. To solve this problem, we introduce a sequential, append-only log, representing all the transactions requested to the database; flow is that a user requests a transaction, the transaction is appended to the WAL, and the data is then written to the disk.

  5. This way, we only return success once the data is forced out of memory onto the WAL (fsync); if the system crashes during the write to data file, we simply replay the WAL on startup to recover.

I suspect this would be redundant for my system.

My data file is a sequential and only changs via appending, so the WAL would be a copy of the data file (with structural variations, but behaving the same), so what could go wrong with my data file could also go wrong with the WAL; the WAL provides nothing but potentially a backup at the expense of more storage and work.

What are problems with my reasoning that a WAL is redundant for my TSDB (time-series DB)?