views:

1532

answers:

4

Hi,

I'm looking for a c# wrapper to a native MD5 or SHA1 library to improve hash calculation performance.

Previously I switched SharpZipLib to zlib and got more than 2x performance boost. (ok, you've to take care you've the right zlib.so or zlib.dll depending on the OS and hardware, but it pays off).

Will it be worth for MD5 or SHA1 or both .NET and Mono rely on a native implementation already?

(Edited) Also: in case I've to stick to the MD5CryptoServiceProvider, is there a way in which I can calculate a hash of a file while I'm reading it? I mean, send bytes in chunks but still calculate the whole hash?

+3  A: 

The SHA1CryptoServiceProvider class uses the underlying Windows API implementation. However, SHA1Managed is pretty fast.

EDIT: Yes, it's possible to compute the hash step by step. The TransformBlock and TransformFinalBlock methods do this.

Mehrdad Afshari
pretty fast can mean many things ... turns out pretty fast means 3 times slower .... still 30MB per 300ms is plenty fast
Sam Saffron
A: 

I would just use the BCL's SHA1 and MD5CryptoServiceProvider classes. The ones that ship with the framework are quite fast.

Reed Copsey
Thanks. That's what I'm using right now, I'm just wondering if there's a way to make it faster. I'm hashing entire files.
pablo
+5  A: 

MD5 and SHA1 rely on native implementaions, nonetheless its possible a C++ solution + introp could be slightly faster, cause you could possibly reduce the number of method calls a bit and optimize the native implementation.

Keep in mind that the Native (SHA1CryptoServiceProvider) can be 3X faster than the managed one(SHA1Managed).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Security.Cryptography;

namespace ConsoleApplication22 {



    class Program {

        static void Profile(string description, int iterations, Action func) {

            // clean up
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            // warm up 
            func();

            var watch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; i++) {
                func();
            }
            watch.Stop();
            Console.Write(description);
            Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
        }

        static void Main() {
            SHA1Managed managed = new SHA1Managed();
            SHA1CryptoServiceProvider unmanaged = new SHA1CryptoServiceProvider();

            Random rnd = new Random();

            var buffer = new byte[100000];
            rnd.NextBytes(buffer);

            Profile("managed", 1000, () => {
                managed.ComputeHash(buffer, 0, buffer.Length);
            });

            Profile("unmanaged", 1000, () =>
            {
                unmanaged.ComputeHash(buffer, 0, buffer.Length);
            });

            Console.ReadKey();
        }
    }
}
managed Time Elapsed 891 ms
unmanaged Time Elapsed 336 ms

Also Keep in mind unless my calculation is wrong, the unmanaged implementation is hashing 100MB of data in about 300 milliseconds, this would very rarely be a bottleneck.

Sam Saffron
An interop solution would require marshalling, however, which could mitigate any other gains that might otherwise be made. Just something to keep in mind.
jrista
my understanding is the SHA1CryptoServiceProvider requires marshalling anyway, its using extern calls
Sam Saffron
It makes sense.
pablo
Fortunately I checked I'm using the unmanaged one: MD5CryptoServiceProvider. Loved your profiling example!
pablo
Sam, you're right, the problem must be somewhere else. One question, though: Is there a way to hash in chunks? I need to read the file and also to hash it, can I do it in only one pass?
pablo
Yerp, see TransformBlock and TransformFinalBlock functions
Sam Saffron
A: 

Depending on your application of hashing, MD5 might not be applicable. MD5 is only useful in error correction, it's no longer viable as a check against malicious file alteration.

http://en.wikipedia.org/wiki/Md5#Vulnerability

The short story is, MD5 collisions are easy to generate by changing 16 bytes in a file.