views:

97

answers:

4

How can I verify two CRC implementations will generate the same checksums?

I'm looking for an exhaustive implementation evaluating methodology specific to CRC.

+2  A: 

Create several unit tests with the same input that will compare the output of both implementations against each other.

Bernard
I've got this in place, but how can we know we've covered (at least a meaningful fragment of) all possible input? I'm looking for something related to the nature of CRCs here that could point us in the direction of HOW to write these tests in a way that effectively covers the input range.
Joe
@Joe- A series of 20-30 random inputs of different sizes should be sufficient to prove that the CRC algorithms produce equal outputs. I have never seen two implementations that produced output that was "close" to each other; instead, even slight differences produced large changes in the output. That being said, if these are homemade CRC implementations that aren't known to be bug-free, then a coding bug may cause problems for your tests.
bta
I know my above comment is "proof by lack of a counterexample", but the idea still holds. If the implementations use different polynomials, then you will get radically different results.
bta
Thanks, accepting this.
Joe
I would also test at least 256 "consecutive" inputs (i.e. same file, but pick a byte and change it to all 256 possible bytes).
Brian
You should note that you could test each step of calculating the CRC of a buffer or string because each step porduces a CRC (partial) checksum.
nategoose
Whilst random input data are useful as regression tests, they're not very useful as exhaustive proof of correctness. There're all sorts of nasty edge-case bugs that can creep in to this sort of thing (e.g. if you're on a 16-bit embedded processor, carrying from LO to HI). You'll need to do more than just random input.
Oli Charlesworth
A: 

By writing a unit test for each which takes the same input and verify against the expected output.

Darin Dimitrov
I understand -- this is what we're doing -- but how do we know we've exhausted (or even come close to exhausting) the possible input space? I'm looking for something based on the way CRCs work specifically.
Joe
A: 

First, if it is a standard CRC implementation, you should be able to find known values somewhere on the net.

Second, you could generate some number of payloads and run the each CRC on the payloads and check that the CRC values match.

Adam Tegen
+3  A: 

You can separate the problem into edge cases and random samples.

Edge cases. There are two variables to the CRC input, number of bytes, and value of each byte. So create arrays of 0, 1, and MAX_BYTES, with values ranging from 0 to MAX_BYTE_VALUE. The edge case suite will be something you'll most likely want to keep within a JUnit suite.

Random samples. Using the ranges above, run CRC on randomly generated arrays of bytes in a loop. The longer you let the loop run, the more you exhaust the inputs. If you are low on computing power, consider deploying the test to EC2.

rob