tags:

views:

2282

answers:

3

I'm using the following function to compute the CRC32 of a file in a VS2008, .NET 3.5 project:

public UInt32 ComputeHash(System.IO.Stream stream)
{
    unchecked
    {
        const int BUFFER_SIZE = 1024;

        UInt32 crc32Result = 0xFFFFFFFF;
        byte[] buffer = new byte[BUFFER_SIZE];
        int count = stream.Read(buffer, 0, BUFFER_SIZE);

        while (count > 0)
        {
            for (int i = 0; i < count; i++)
            {
                crc32Result = ((crc32Result) >> 8) ^ _crc32Table[(buffer[i]) ^ (crc32Result) & _LOOKUP_TABLE_MAX_INDEX];
            }
            count = stream.Read(buffer, 0, BUFFER_SIZE);
        }

        return ~crc32Result;
    }
}

For the sake of brevity, I have left out the function that builds the lookup table (_crc32Table). The table is an array of UInt32, is built when the class is instantiated, and contains 256 values (256 is also the value of _LOOKUP_TABLE_MAX_INDEX + 1).

I have run some benchmarks comparing this to the MD5CryptoServiceProvider and SHA1CryptoServiceProvider ComputeHash functions and they are much faster. The MD5 function is over twice as fast and the SHA1 hash is about 35% faster. I was told CRC32 is fast, but that's not what I'm seeing.

Am I mistaken in my assumptions? Is this to be expected or is there a flaw in this algorithm?

A: 

maybe: Are you counting the computation of the lookup table in your observation of the CRC throughput? Normally the lookup table is computed once and cached. If you don't cache it you will be computing it every time you calculate a CRC. Also if you only measure one CRC, then you may have included the table computation cost in the CRC computation cost. Best to measure many iterations of each hash.


Addendum: When I measured, I saw a factor of 2.6x, comparing your CRC32 to the MD5 hash, when the app was compiled with /debug+ and /optimize-. Using /debug- and /optimize+, I saw a factor of 1.6x. The absolute MD5 performance did not vary when I changed the compile flags. With no debug, CRC was still slower, but it was much closer.

Cheeso
As I said in my question, "The table is an array of UInt32, is built when the class is instantiated...". The table exists before the hashing begins.
raven
of course, but do you re-compute the table for every hash? Is it a static table or an instance table. And do you compute many iterations of the hash and average the response time? Or just one. If it is an instance variable and you create a new instance of the Crc32 class for every CRC32 calculated, then you pay the init cost every time. If you measure only one hash, then you pay the init cost. No?
Cheeso
I see what you're saying. The timing doesn't start until after instantiation of the class, so table creation time is not a factor. Even if it were included in the time calculation, it wouldn't affect the results much. I'm hashing quite large files (hundreds of megabytes) and the CRC32 calculation is taking 7-8 seconds as compared to the milliseconds it takes to create the table. Typical example: 699 MB file, CRC32 7.3 s, MD5 3.7 s, SHA1 4.8 s.
raven
Ah, got it. If it is 7s of runtime per cycle, the init cost will be ~invisible. Did you try /debug- on the compiler options?
Cheeso
All /debug- does is stop emitting debug symbol files. It does not have any effect on code generation. It's /optimize+ that changes code generation.
Eric Lippert
@Eric - Thanks for that .
Cheeso
+3  A: 

You are comparing your code to built in functions and asking why they are faster. What you need to do is find the source for the built in functions. How do they work? See what's different.

Betcha the built in functions call out to a native library and cheat by not having to run inside the managed memory framework.

Karl
That's what I was thinking. It's the penalty of managed code.
raven
@raven, not necessarily. Remember, built in functions like this are engineered over years and heavily optimized before they are shipped. Like Eric Lippert suggested in his comment, profile your code.
unforgiven3
A: 

I'm not too familiar with the optimizations that are automatically being performed when this code executes but you have a couple of options if profiling isn't working for you.

I can suggest trying out unsafe code and using pointer arithmetic for the buffer[i] and _crc32Table lookups in case it isn't already being optimized.

The only other place I can see you running into performance issues is with the Stream.Read calls. Have you experimented with different BUFFER_SIZE values?

Using a larger byte buffer and possibly doing some manual loop unrolling could help you out too if they aren't being optimized automatically.

nvuono