Picking a CRC

Wait 5 sec.

You send a file, but how do you know it arrived intact? In other words, how do you know that it didn’t get cut off, garbled, or changed somehow? Simplistically, you could just add up all the bytes in the file — a checksum — and send that along with the file. You compute the checksum when you know the file is good, and the receiver can compare the checksum to see if they match.However, a simple addition doesn’t catch certain classes of errors, which is why there are better checksum algorithms that, for example, wrap the carry bit around or otherwise modify files with common errors so they produce different checksums. There are two problems with checksums. First, no matter how much you modify the algorithm, the chances that two files produce the same checksum are pretty high. Especially with common error patterns.For example, assume a very simple algorithm that simply adds the bytes and discards any carry. If a file contains 0x80, 0x80, those numbers essentially cancel each other out. If you replace them with 0, 0, you’ll get the same checksum. To some degree, using anything other than a second copy of the entire file will have this problem — some corruption goes undetected — but you want to minimize the number of times that happens.The other problem is that a checksum by itself doesn’t let you correct anything. You know the data is bad, but you don’t know why. If you think about it, the simplest checksum is a parity bit on a byte: odd parity is simply summing all the bits together. If the parity bit doesn’t match, you know the byte is bad, but you don’t know why. Any even number of errors goes undetected, but I am sure one-, three-, five-, or seven-bit errors will get caught.People invent better error-checking codes by devising schemes that can promise they can detect a certain number of bit flips and, at least in some cases, correct them. One of these is the cyclic redundancy check (CRC). It is easy to think of the CRC as a “strong checksum,” but it actually works differently. What’s more, there isn’t just a single CRC algorithm. You have to select or design a particular algorithm based on your needs. Most people pick a “named” implementation like CCITT or Ethernet and assume it must be the best. It probably isn’t.A CRC is a checksum in the broad sense: you feed it a message, and it gives you a small value that you append, store, or compare later. But unlike a simple additive checksum, a CRC is based on polynomial division over GF(2), which is a fancy way of saying “divide using XOR instead of carries.” That detail matters. It gives CRCs very strong guarantees against common classes of errors, provided you choose the right polynomial for the job. That’s the key. You must choose the right polynomial.The Polynomial MachineA CRC treats your message as a long binary polynomial. For example, the byte stream is interpreted as a sequence of coefficients: each bit is either present or absent. The CRC algorithm divides the message polynomial, after shifting it by the CRC width, by a generator polynomial. The remainder is the CRC.In normal arithmetic, division involves subtraction and carries. In CRC arithmetic, subtraction is XOR. That is why CRC code often looks like this:if (crc & topbit)crc = (crc