views:

814

answers:

2

My text file says The quick brown fox jumps over the lazy dog, however when i try to get the hash from this file both the md5 and sha1 is different from the wikipedias result. I have 3 questions. 1) What did i do wrong in the code? 2) How can i have this piece of code better? (do i need the Initialize) 3) How do i salt this?

    {
        const int bufSize = 1024 * 8;
        int read;
        byte[] buf = new byte[bufSize];
        string fn = @"b.txt";
        byte[] result1 = new byte[0];
        byte[] result2 = new byte[0];
        SHA1 sha = new SHA1CryptoServiceProvider();
        MD5  md5 = new MD5CryptoServiceProvider();
        sha.Initialize();
        md5.Initialize();
        FileStream fin = File.OpenRead(fn);
        while ((read = fin.Read(buf, 0, buf.Length)) != 0)
        {
            result1 = sha.ComputeHash(buf);
            result2 = md5.ComputeHash(buf);
        }
        fin.Close();
        MessageBox.Show(myFunc(result1));
        MessageBox.Show(myFunc(result2));
    }
A: 

Regarding different results, it could depend on the character encoding you are using in your text, since these hash algorithms deal with bytes.

As far as improving your code, you are not disposing of your CryptoServiceProviders. Everything that inherits from HashAlgorithm implements IDisposable and needs to be disposed when you're finished with it, either by using using() or calling Dispose() directly. You're also not disposing of the FileStream, which has the same requirements regarding IDisposable.

Rex M
I believe all character encodings (at least, all that handle the Lating alphabet at all) will map the string "The quick brown fox jumps over the lazy dog" to the same set of bytes, since it's just plain ASCII text.
David Zaslavsky
@David: Try Encoding.Unicode.GetBytes("The quick brown fox jumps over the lazy dog") or Encoding.UTF32.GetBytes().
Rasmus Faber
+5  A: 

(EDIT: Disposing of the hash algorithms now. I suspect it's unnecessary, but it's good practice :)

You're calling ComputeHash for the whole buffer even though you should only be hashing the portion of the buffer you've read. In addition, you're computing a new hash for each call to Read.

Here's some really simple code to compute the hashes:

using System;
using System.IO;
using System.Security.Cryptography;

class Test
{
    static void Main()
    {
        byte[] plaintext = File.ReadAllBytes("b.txt");
        using (MD5 md5 = MD5.Create())
        {
            byte[] md5Hash = md5.ComputeHash(plaintext);
            Console.WriteLine(BitConverter.ToString(md5Hash));
        }

        using (SHA1 sha1 = SHA1.Create())
        {
            byte[] sha1Hash = sha1.ComputeHash(plaintext);
            Console.WriteLine(BitConverter.ToString(sha1Hash));
        }
    }
}

This gives the results as per wikipedia - note that b.txt shouldn't have a newline at the end of it.

An alternative way of getting the binary data to start with would be:

byte[] plaintext = Encoding.ASCII.GetBytes(
    "The quick brown fox jumps over the lazy dog");

Note that this is just the simple way of computing a hash in one go. If you want to do it in a streaming fashion (i.e. where you read some data, add it to the hash, read some more data etc) then either you can use the ComputeHash(Stream) overload or (if you want to "push" data to it) you can use TransformBlock and TransformFinalBlock, like this:

using System.Text;

class Test
{
    static void Main()
    {
        using (MD5 md5 = MD5.Create())
        using (SHA1 sha1 = SHA1.Create())
        using (Stream input = File.OpenRead("b.txt"))
        {
            // Artificially small to make sure there's
            // more than one read
            byte[] buffer = new byte[4];
            int bytesRead;

            while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
            {
                md5.TransformBlock(buffer, 0, bytesRead, null, 0);
                sha1.TransformBlock(buffer, 0, bytesRead, null, 0);
            }
            md5.TransformFinalBlock(buffer, 0, 0);
            sha1.TransformFinalBlock(buffer, 0, 0);

            Console.WriteLine(BitConverter.ToString(md5.Hash));
            Console.WriteLine(BitConverter.ToString(sha1.Hash));
        }
    }
}

Note the way we pass null to TransformBlock because we don't need any output, and we don't transform any data in the final block. I suspect this is the example you'll want to use, based on your previous comments.

Jon Skeet
I do need to read it in a streaming fashion as my files can be multiple gb in the odd case. The example was solely to check if i got the same hash mention in wikipedia.The streaming part is what is difficult, ComputeHash doesnt seem to allow me to tell it the current hash value.
acidzombie24
@acidzombie: See my edit - I think you want the second example I've given.
Jon Skeet
Perfect, just perfect. it works fantastic, thank you :D
acidzombie24