tags:

views:

61

answers:

2

Dear all,

I have faced aweird problem with the following code, the code below suppose to stop after one iteration, but it just keep going. However, if I remove the last "result_bytes = md5.ComputeHash(orig_bytes);" then it will work. Does anyone face similar problem before?

MD5 md5;
            byte[] orig_bytes;
            byte[] result_bytes;
            Dictionary<byte[], string> hashes = new Dictionary<byte[], string>();

            string input = "NEW YORK";
            result_bytes = UnicodeEncoding.Default.GetBytes("HELLO");
 while (!hashes.ContainsKey(result_bytes))
            {
                md5 = new MD5CryptoServiceProvider();
                orig_bytes = UnicodeEncoding.Default.GetBytes(input);
                result_bytes = md5.ComputeHash(orig_bytes);

                hashes.Add(result_bytes, input);
                Console.WriteLine(BitConverter.ToString(result_bytes));
                Console.WriteLine(hashes.ContainsKey(result_bytes));

                result_bytes = md5.ComputeHash(orig_bytes);
            }
+2  A: 

When you reassign result_bytes to a new value in the last line, you have a new reference to a byte array, which is not equal to the one in the collection, therefore hashes.ContainsKey returns false.

Thanks for the reply, but they are doing the exact same thing result_bytes = md5.ComputeHash(orig_bytes); this is just a snapshot of what my actual logic is as I need to recalculate, but what wonders me is why would the same command give different result? orig_bytes did not change at all and breakpoint shows me the byte[] content are identical
cherhan
Although the bytes in two arrays are equal, each time you call ComputeHash - you get a new byte array with a different reference and different GetHashCode() value, which Dictionary uses to compare it's keys. (byte[] is a reference type and each new object is considered to be different by default, whatever the contents are)
+2  A: 

You're assuming that byte arrays override Equals and GetHashCode to compare for equality: they don't. They just use the default identity test - so without the extra assignment at the end, you're just checking whether the exact key object you've just added is still in the dictionary - which of course it is.

One way round this would be to store a reversible string representation of the hash (e.g. using base64), instead of the hash itself. Or write your own implementation of IEqualityComparer<byte[]> and pass that to the Dictionary constructor, so that it uses that implementation to find the hash code of byte arrays and compare them with each other.

In short: this has nothing to do with MD5, and everything to do with the fact that

Console.WriteLine(new byte[0].Equals(new byte[0]));

will print False :)

Jon Skeet
I made similar comment on the thread below too. This is a snapshot of my actual logic, but my question is, if orig_bytes did not change, why would result_bytes changed? If result_bytes did not change (which I am pretty sure unless I have wrong understanding on MD5), then why would the contains_key failed?
cherhan
@cherhan: I thought I'd explained this reasonably clearly... you're recomputing the MD5 hash, so you're ending up with a new array. It contains the same data as the previous array, but it won't compare as being equal under Equals/GetHashCode, which is what Dictionary is relying on. See my example with two empty arrays - they obviously have the same data (none!) but `Equals` returns false.
Jon Skeet