Introducing MinLZ

MinLZ is a compression algorithm developed by MinIO. The main goal was to provide a format that offered the best-in-class compression while providing very fast decompression even with modest hardware. For now an implementation for Go is available.

There exists different types of compression algorithms and very good implementations. At MinIO we already use an enhanced version of Snappy, which has been serving us well. But as time has moved on we found some possible improvements for better encoding of the compressed data.

We therefore set our goals:

While keeping the existing speed of our compressor, would it be possible to improve compression enough to justify an upgrade with an easy upgrade path.

Still provide compression/decompression that will outperform IO, so it can safely be left on.

This will explain some of the design considerations made for creating MinLZ. If you just want to try out the code, go to https://github.com/minio/minlz

Snappy/LZ4 Heritage

An important factor is that we must retain great decompression speed. We will also not implement features that require certain CPU features to make it operate fast. Since CPUs are scaling towards more cores we focused on having good options for highly concurrent compression and decompression while targeting a similar single-threaded performance.

Since our main goal is to retain the speed, we decided to stick to fixed size encoding, with byte aligned operations, since this allows very fast decoding without specific CPU instructions. Our goal for compression speed was to retain the same speed, but possibly sacrifice some encoding speed for decompression speed.

Furthermore, we want our compressed output to be optionally seekable with an index and always able to be (de)compressed concurrently. This means that MinLZ also keeps compressed blocks independent of each other.

Lifting any of these limitations would undoubtedly increase the potential compression — but we would also begin to overlap with good compressors like Zstandard, which provides all of the above, but typically at a lower speed.

We see MinLZ as the next generation of LZ4/Snappy and we provide full backwards compatibility with Snappy/S2 in our Go implementation.

Static Encoding & Test Sets

Choosing a static encoding means that there will be little flexibility to adapt to content types dynamically. Different content will have different match characteristics. We found that most “typical” test-sets used for testing aren't very reflective of the typical content we encounter. For static encoding this is particularly important, since an improvement in one datatype can easily produce a regression in another.

Our primary data type has been on serialized data. That means that improvements on these data types have had the highest weight in our decisions. This ranges from human readable content like HTTP logs, JSON, CSV, XML to binary types like Protobuf, MsgPack, and database dumps.

Mixed files (file system backups), filesystem images and natural text have been checked for regressions, but have not been given special consideration.

Efficiency

In the following table, we have been tracking compression between our old S2 compression.

To make a more direct comparison, we have kept the block size 4MB for both compressors.

The size comparison is the difference in output size, so it reflects the bandwidth or disk space saved.

Source Files

File	Compressor	Level	Input	Output	MB/s	Size
gob-stream	s2	1	1,911,399,616	347,631,748	13,985
gob-stream	s2	2	1,911,399,616	303,776,298	8,307
gob-stream	s2	3	1,911,399,616	258,013,815	712
gob-stream	minlz	1	1,911,399,616	312,315,634	14,046	-10.16%
gob-stream	minlz	2	1,911,399,616	269,650,556	8,385	-11.23%
gob-stream	minlz	3	1,911,399,616	209,509,997	629	-18.80%

github-june-2days-2019.json	s2	1	6,273,951,764	1,041,700,255	16,314
github-june-2days-2019.json	s2	2	6,273,951,764	944,872,699	9,803
github-june-2days-2019.json	s2	3	6,273,951,764	826,384,742	718
github-june-2days-2019.json	minlz	1	6,273,951,764	974,656,419	14,709	-6.44%
github-june-2days-2019.json	minlz	2	6,273,951,764	901,171,279	8,769	-4.63%
github-june-2days-2019.json	minlz	3	6,273,951,764	742,067,802	565	-10.20%

github-ranks-backup.bin	s2	1	1,862,623,243	623,832,101	12,769
github-ranks-backup.bin	s2	2	1,862,623,243	568,441,654	5,967
github-ranks-backup.bin	s2	3	1,862,623,243	553,965,705	325
github-ranks-backup.bin	minlz	1	1,862,623,243	604,315,773	10,170	-3.13%
github-ranks-backup.bin	minlz	2	1,862,623,243	517,472,464	4,062	-8.97%
github-ranks-backup.bin	minlz	3	1,862,623,243	480,707,192	260	-13.22%

nyc-taxi-data-10M.csv	s2	1	3,325,605,752	1,093,516,949	10,705
nyc-taxi-data-10M.csv	s2	2	3,325,605,752	884,711,436	6,481
nyc-taxi-data-10M.csv	s2	3	3,325,605,752	773,678,211	421
nyc-taxi-data-10M.csv	minlz	1	3,325,605,752	991,637,926	9,385	-9.32%
nyc-taxi-data-10M.csv	minlz	2	3,325,605,752	771,160,863	6,559	-12.83%
nyc-taxi-data-10M.csv	minlz	3	3,325,605,752	625,670,328	349	-19.13%

apache.log	s2	1	2,622,574,440	230,521,260	18,012
apache.log	s2	2	2,622,574,440	217,884,566	12,761
apache.log	s2	3	2,622,574,440	185,357,903	1,576
apache.log	minlz	1	2,622,574,440	194,361,157	16,972	-15.69%
apache.log	minlz	2	2,622,574,440	174,819,425	12,680	-19.77%
apache.log	minlz	3	2,622,574,440	139,449,942	1,295	-24.77%

enwik9	s2	1	1,000,000,000	487,526,387	7,138
enwik9	s2	2	1,000,000,000	416,581,669	3,930
enwik9	s2	3	1,000,000,000	370,860,824	224
enwik9	minlz	1	1,000,000,000	475,301,880	7,417	-2.51%
enwik9	minlz	2	1,000,000,000	384,249,298	3,922	-7.76%
enwik9	minlz	3	1,000,000,000	319,191,220	184	-13.93%

sofia-air-quality-dataset.tar	s2	1	15,464,463,872	4,991,758,602	11,536
sofia-air-quality-dataset.tar	s2	2	15,464,463,872	4,432,998,965	5,973
sofia-air-quality-dataset.tar	s2	3	15,464,463,872	4,017,422,246	330
sofia-air-quality-dataset.tar	minlz	1	15,464,463,872	4,710,981,979	10,024	-5.62%
sofia-air-quality-dataset.tar	minlz	2	15,464,463,872	3,889,986,428	5,726	-12.25%
sofia-air-quality-dataset.tar	minlz	3	15,464,463,872	3,488,248,644	291	-13.17%

consensus.db.10gb	s2	1	10,737,418,240	4,549,762,344	13,006
consensus.db.10gb	s2	2	10,737,418,240	4,416,693,179	5,273
consensus.db.10gb	s2	3	10,737,418,240	4,210,593,068	256
consensus.db.10gb	minlz	1	10,737,418,240	4,473,817,380	10,580	-1.67%
consensus.db.10gb	minlz	2	10,737,418,240	4,365,920,502	3,700	-1.15%
consensus.db.10gb	minlz	3	10,737,418,240	4,111,316,366	196	-2.36%

rawstudio-mint14.tar	s2	1	8,558,382,592	4,413,944,184	10,707
rawstudio-mint14.tar	s2	2	8,558,382,592	4,091,266,111	4,028
rawstudio-mint14.tar	minlz	3	8,558,382,592	3,684,229,496	267
rawstudio-mint14.tar	minlz	1	8,558,382,592	4,306,160,388	9,320	-2.44%
rawstudio-mint14.tar	minlz	2	8,558,382,592	3,942,168,407	4,176	-3.64%
rawstudio-mint14.tar	minlz	3	8,558,382,592	3,684,229,496	225	-14.44%

silesia.tar	s2	1	211,947,520	96,899,524	3,431
silesia.tar	s2	2	211,947,520	86,693,244	1,269
silesia.tar	s2	3	211,947,520	79,612,333	201
silesia.tar	minlz	1	211,947,520	93,738,174	2,922	-3.26%
silesia.tar	minlz	2	211,947,520	81,100,405	1,283	-6.45%
silesia.tar	minlz	3	211,947,520	71,194,055	153	-10.57%

10gb.tar	s2	1	10,065,157,632	5,915,541,066	10,151
10gb.tar	s2	2	10,065,157,632	5,455,008,813	3,309
10gb.tar	s2	3	10,065,157,632	5,192,490,222	288
10gb.tar	minlz	1	10,065,157,632	5,859,748,636	8,980	-0.94%
10gb.tar	minlz	2	10,065,157,632	5,256,474,340	4,838	-3.64%
10gb.tar	minlz	3	10,065,157,632	4,855,930,368	266	-6.48%

enwik10	s2	1	10,000,000,000	4,759,943,135	9,142
enwik10	s2	2	10,000,000,000	4,086,077,586	4,338
enwik10	s2	3	10,000,000,000	3,615,506,650	237
enwik10	minlz	1	10,000,000,000	4,635,654,933	8,236	-2.61%
enwik10	minlz	2	10,000,000,000	3,803,364,964	4,085	-6.92%
enwik10	minlz	3	10,000,000,000	3,132,931,075	205	-13.35%

sharnd.out.2gb	s2	1	2,147,483,647	2,147,487,753	14,209
sharnd.out.2gb	s2	2	2,147,483,647	2,147,487,753	9,528
sharnd.out.2gb	s2	3	2,147,483,647	2,147,487,753	4,045
sharnd.out.2gb	minlz	1	2,147,483,647	2,147,487,762	13,646	0.00%
sharnd.out.2gb	minlz	2	2,147,483,647	2,147,487,762	9,538	0.00%
sharnd.out.2gb	minlz	3	2,147,483,647	2,147,485,714	3,282	0.00%

We think that this overall provides a worthwhile compression improvement at simliar speeds to our existing implementation, which beats Snappy and is roughly on par with LZ4.

See more benchmarks, with comparisons to Snappy/LZ4 on the respository

Block Improvements

LZ4 is extremely fast to decompress since there is very little branching in the decoder, and those branches are easy to predict for the CPU. Our overall impression is that the flexibility of Snappy encoding is the better approach for more efficient results, and worth taking a small speed hit for. We decided to continue on the “Snappy” approach, but see if there was anything that could be improved.

A “block” is a stream of operations. Usually these are “emit literals” or copy operations. Emitting literals simply tells the decoder to append a number of given bytes to the output, which is then the base for the copy operations…

Copy Encoding

Copies are the base of LZ77 encoding. When decoding a stream a copy operation basically tells the encoder to “go back by a specific offset and copy a number of bytes to output”.

This is the encoding of offsets/lengths that are possible with Snappy:

Snappy	Length	Offset	Match Length
Tag 1	2 bytes	0 - 2,047	4 - 12
Tag 2	3 bytes	0 - 65,535	1 - 64
Tag 3	5 bytes	0 - 4,294,967,295	1 - 64

The first limitation we saw was that lengths were quite limited. S2 added a “repeat” with the Invalid “Tag 1, Offset = 0” encoding, allowing it to extend the previous match. This allowed any length of match to be encoded as a copy+repeat. It was however wasting all the offset bits, since this had to be 0 to indicate the repeat.

Another thing that stood out was that Snappy jumps from 16-bit to 32 bit offsets. Many of the 32 bits used here would most often consist of 0s, since blocks never really got to this size.

Finally, when observing statistics, we found that moving one bit from the Tag 1 offsets to the length would yield better compression - even with less expensive repeats.

MinLZ	Length	Offset	Match Length
Tag 1	2 bytes	1 - 1,024	4-18 (18-273)
Tag 2	3 bytes	64 - 65,599	4 - 64 (64-...)
Tag 3	4 bytes	65536 - 2,162,687	4 - 64 (64-...)

Looking at the offsets, we first got rid of the invalid “offset=0” value. Not having to check for that seems to outweigh the unconditional addition of a base value.

Second of all, the longer offsets have a 64 byte minimum. While this does enable slightly longer offsets, that is not the primary goal. The goal of this base value is to avoid a check of overlapping copies when decoding. With a base value of 64, we guarantee that it will always be possible to copy long matches 64 bytes at the time without risking overlapping the output.

This moves some branching from the decoder to the encoder. We did experiment with using higher base values, however it provided little benefit and just limited the flexibility of the encoder.

Another change is to add more flexible length encodings. This means that similar to literal encoding we reserve some lengths as special values, that will read the actual length from 1-3 following bytes. These all have base values, which also reduces further branching in the decoder.

Repeat Encoding

A “repeat” means a copy with the same offset as the last match. This can be used to extend the last copy or - more often - resume copying after one or more bytes that were different from the last output.

As mentioned above, the repeat encoding in S2 was a bit wasteful, since it was designed to fit into existing encoding.

After some experimentation, we found that long length literal operations were pretty rare, so we decided to “take” one bit from the length encoding and use that to indicate a repeat.

Repeat Length	Bytes
1 - 29	1
30 - 285	2
30 - 65565	3
30 - 16777245	4

The repeat lengths and literal length encoding is the same, so this decoding can be shared.

We experimented with different repeat types. First a repeat with small offset. This will allow us to encode small inserts or deletes. However, from several experiments this had little impact, giving small improvements on some payloads and regressions on others due to taking away bits from other places. We decided to abandon this since it wasn’t worth the complexity and extra processing and the cost of detecting these.

Similarly, we experimented with having longer repeat history, similar to zstandard - and “promoted repeats”, where often used offsets would get stored. However, again, the gains were marginal - and while some payloads benefitted, the extra complexity, added decoding processing and size regressions on other payloads made us abandon this.

Fused Literals

Looking at LZ4, one thing that stood out was that it was very efficient with many small literal and copy blocks.

This is because the base encoding of operations is done like this…

LZ4 Encoding:

Literals	Copy Length	Offset	Length
0-14 (15-....)	4-18 (19 - …)	0-65,535	3 bytes (4 - …)

In cases where there are few literals and short copies both operations together only have a single byte of overhead.

Again, after doing some statistical analysis, we found that most literal blocks were very short. In fact most are less than 4 bytes. With one byte over overhead - to store tag and length plus the copy tag, these were quite inefficient.

With long offsets we found that we could easily spare 3 bits by limiting the maximum offset to 2MB. Limiting offsets would limit the possible matches that could be referenced, but we found that in most payloads long offsets were generally quite rare. Being able to emit fused literals on average made up for the benefits of very long offsets.

To have efficient representation of short+closer matches with fused literals we also added a fused literal mode with 2 byte offset similar to the copy. With that we ended up with the following encoding types for fused literals:

Literals	Copy Length	Offset	Length
1 - 4	4 - 11	64 - 65,599	3 bytes
0 - 3	4 - 64 (...)	64K - 2,162,687	4 bytes

We can now represent a short literal + medium offset with only 1 byte overhead, and “long offsets” can have 0 to 3 literals without any additional overhead.

The limited back-reference offset will only “affect” blocks that are actually bigger than 2MiB, and we found that it only had a small impact on most practical content, so we believe this is a worthwhile tradeoff. Furthermore, limiting offsets also reduces the chance of referencing data that is out of CPU cache, increasing decompression speed.

The limited length of the fused literals and the guarantee that it will always be followed by a 4 byte copy also means that copying these literals can be done with less branching than separate blocks.

To read all the details of the encoding, see the SPEC.md in the github repository. We also provide simplified compression/decompression examples in internal/reference.

Stream Improvements

The stream format remains largely unchanged from Snappy/S2, except that previous compressed blocks are no longer used.

Stream Identifier

The stream identifier is changed with a new magic string. Also the stream identifier now includes the maximum block size that can be expected on the stream. This allows for allocating buffers at an appropriate size.

EOF validation

Streams now have EOF validation. Previously (in Snappy) there was no way to detect truncated streams at a block boundary. An EOF block optionally includes the number of decompressed bytes that should have been produced by the stream.

A valid MinLZ must include an EOF block, but may be followed by another stream identifier.

Stream Seeking

MinLZ carries over stream indexes from S2. A stream index allows random seeking within a compressed stream.

The index can either be added to the end of a stream or kept separately depending on the use case. All indexes are optional.

User Defined Chunks

The user defined chunks have been revised a bit.

0x00 -> 0x7F are internal types and should not be used for user data.
0x80 -> 0xBF are now defined as “skippable user chunks”. When encountering an unknown chunk, it can safely be skipped.
0xC0 -> 0xFD are “non-skippable user chunks”. If encountering an unknown chunk with one of these IDs decoding should stop.

You should still use additional checks to ensure that the chunk is of the expected type.

Commandline Tool

If you want to try out MinLZ without any code, you can use our commandline tool.

The tool is called mz and allows stream and block compression and decompression.

Furthermore, it adds “tail” and “offset” decompression of streams with an index, as well as http/https input.

klauspost/minlz.md Secret

Select an option

No results found