views:

169

answers:

1

Hello,

I'm writing a c# routine that creates hashes from jpg files. If I pass in a byte array to my SHA512 object then I get the expected behavior, however, if I pass in a memory stream the two files always hash to the same value.

Example 1:

        SHA512 mySHA512 = SHA512.Create();

        Image img1 = Image.FromFile(@"d:\img1.jpg");
        Image img2 = Image.FromFile(@"d:\img2.jpg");
        MemoryStream ms1 = new MemoryStream();
        MemoryStream ms2 = new MemoryStream();

        img1.Save(ms1, ImageFormat.Jpeg);
        byte[] buf1 = ms1.GetBuffer();
        byte[] hash1 = mySHA512.ComputeHash(buf1);

        img2.Save(ms2, ImageFormat.Jpeg);
        byte[] buf2 = ms2.GetBuffer();
        byte[] hash2 = mySHA512.ComputeHash(buf2);

        if (Convert.ToBase64String(hash1) == Convert.ToBase64String(hash2))
            MessageBox.Show("Hashed the same");
        else
            MessageBox.Show("Different hashes");

That produces "Different hashes". But one of the overloads of the ComputeHash method takes a stream object in and I'd rather use that. When I do:

        SHA512 mySHA512 = SHA512.Create();

        Image img1 = Image.FromFile(@"d:\img1.jpg");
        Image img2 = Image.FromFile(@"d:\img2.jpg");
        MemoryStream ms1 = new MemoryStream();
        MemoryStream ms2 = new MemoryStream();

        img1.Save(ms1, ImageFormat.Jpeg);
        byte[] hash1 = mySHA512.ComputeHash(ms1);

        img2.Save(ms2, ImageFormat.Jpeg);
        byte[] hash2 = mySHA512.ComputeHash(ms2);

        if (Convert.ToBase64String(hash1) == Convert.ToBase64String(hash2))
            MessageBox.Show("Hashed the same");
        else
            MessageBox.Show("Different hashes");

That produces "Hashed the same".

What's going on here that I'm missing?

+10  A: 

You're not rewinding your MemoryStreams, so the hash is computed from an empty sequence of bytes. Use

ms1.Position = 0;
ms2.Position = 0;

after calling Save.

One further note: don't use GetBuffer in this way. Use ToArray which will give you a byte array the same size as the stream's length - GetBuffer returns the raw buffer which will (usually) have some padding, which you wouldn't want to use accidentally. You can use GetBuffer if you then make sure you only use the relevant portion of it, of course - this avoids creating a new copy of the data.

Jon Skeet
Thanks for the advice, that works!
Lee Warner