17
votes
$\begingroup$

It is commonly understood that CRC satisfies the linear identity with respect to the $\oplus$ (XOR) operation:

$\operatorname{CRC}(a) \oplus \operatorname{CRC}(b) = \operatorname{CRC}(a \oplus b)$

But after some experimentation and research it appears that this is not generally true.

The particular algorithm in question is the one used in HDLC, ANSI X3.66, ITU-T V.42, Ethernet, Serial ATA, MPEG-2, PKZIP, Gzip, Bzip2, PNG (see Wikipedia) which uses the polynomial $\mathtt{0x04C11DB7}$.

In what sense is CRC linear? Is this a misconception?

$\endgroup$
3
  • $\begingroup$ I'm not asking for a proof, like the linked StackOverflow question. I'm asking if this is a misconception, because it does not appear to be true in practice. $\endgroup$ Commented Mar 27, 2016 at 8:51
  • 2
    $\begingroup$ I have a longer explanation posted previously: stackoverflow.com/a/7005801/839689 $\endgroup$ Commented Mar 27, 2016 at 16:26
  • $\begingroup$ Although CRC is clearly not a cryptographic hash and the post is offtopic, this Q/A has been locked as it seems popular. $\endgroup$ Commented Dec 23, 2024 at 2:52

2 Answers 2

22
votes
$\begingroup$

In practice, CRC operations are often started with a nonzero state. Because of this, the actual equation is usually of the form:

$$crc(a) \oplus crc(b) = crc( a \oplus b ) \oplus c$$

for some constant $c$ (which depends on the length of $a$, $b$).

An alternative way of expressing this is, for three any equal-length bitstrings $a, b, c$, we have:

$$crc(a) \oplus crc(b) \oplus crc(c) = crc( a \oplus b \oplus c ) $$

The technical term for this relationship is affine; in cryptography, we treat it as linear because, for attacks that assume linearity, affine works just as well.

$\endgroup$
2
  • 6
    $\begingroup$ Setting $c=0$ in the alternate equation might be a useful exercise: $crc(a) \oplus crc(b) \oplus crc(0) = crc(a \oplus b)$. Then, one naturally questions what $crc(0)$ evaluates to, tying into your point about starting at a nonzero state. $\endgroup$ Commented Mar 27, 2016 at 15:16
  • $\begingroup$ Ah, that corrects a long-standing terminology problem I have had, with (wrongly) using linear where affine was meant in a cryptanalytic context! I'll have to scrub my earlier answers.. $\endgroup$ Commented Apr 5, 2016 at 16:36
3
votes
$\begingroup$

My answer to how to recalculate a CRC32 on a large byte array

and the comment which follows may explain it.

The linearity comes from the fact that CRC is a remainder of dividing a high degree polynomial with binary coefficients (=data) by a fixed degree polynomial with binary coefficients (=crc polynomial).

Adding of polynomials with binary coefficients is equivalent to an xor operation (and it is obviously linear). So if the data changes, and you know the xor between the old data and the new data, you can calculate CRC of the new data from the CRC of the old data and vice versa.

From security perspective, this makes CRC unreliable way to tell if the data has changed if the data has a padding or even some useless bits in the middle. Those can be easily adjusted to produce the correct "remainder" polynomial by calculating the CRC of each free-to-be-adjusted bit and then solving the system of simultaneous linear equations (to produce intentional collision of CRCs).

Which makes producing CRC collision a trivial problem. This makes CRC unsuitable to detect malicious changes.

$\endgroup$
1
  • 1
    $\begingroup$ You state exactly the same identity for CRC that I give in the question, and in practice I found that it did not hold. $\endgroup$ Commented Mar 28, 2016 at 11:06