views:

376

answers:

4

Consider the following code:

byte[] bytes = new byte[] { 1, 2, 5, 0, 6 };
byte[] another = new byte[] { 1, 2, 5, 0, 6 };

Hashtable ht = new Hashtable();
ht.Add(bytes, "hi");
Assert.IsTrue(ht.ContainsKey(another));

Why does this assertion fail? Being an array of a primitive type shouldn't use using the object reference, should it? So why would it return false? Is there anything I can do to make this hashtable work?

+6  A: 

Being an array of a primitive type shouldn't use using the object reference, should it?

Yes it should. Arrays are reference types.

Everything is working as it's supposed to.

If you want different behaviour, you can implement a comparator for arrays that compares the contents and pass that to the hashtable.

Anon.
So my only "real" option is convert the byte array into a true primitive, such as an integer or a string and use that? Finding the absolute most effiecient way to locate a specific byte[] in this table is of the utmost urgency for th eproject I am working on. Do you have any suggestions?
Ash
Is there an array type that I cna use AS a value type? Perhaps something in `System.Collections` I've not heard of?
Ash
Did you miss the part where he suggested you simply implement a comparator to pass to the hashtable? :-)
Ken
To be more specific, you want to implement the `IEqualityComparer` interface (defining appropriate `Equals(object, object)` and `GetHashCode(object)` methods), and pass that in when you create the hashtable.
Anon.
But doesnt that mean I'd have to create my own object to store my byte array? Wouldn't that be substantially less efficient than using the plain old byte[]?
Ash
I don't mean to sound stupid, but I've not had much experience at all dealing with `IEqualityComparer`. I don't know where to start? Perhaps I'd be better converting the byte[] to a base64string (in terms of efficiency?
Ash
Your `Equals` method should take two byte arrays, and check whether they are equal or not. Your `GetHashCode` method should take a byte array, and generate a hash for it through some method - for example, you could multiply each value by one more than it's index, and sum them all to generate the hashcode.
Anon.
As an addendum: If your `Equals` method returns `true` for two byte arrays, they should have the same hash code. Conversely, if the hash codes are different, `Equals` must return `false`.
Anon.
A: 

It returns false because the hashes don't match. If GetHashCode() doesn't produce a repeatable hash for the same value it won't work in a dictionary.

byte[] bytes = new byte[] { 1, 2, 5, 0, 6 };
byte[] another = new byte[] { 1, 2, 5, 0, 6 };

string astring = "A string...";
string bstring = "A string...";

MessageBox.Show(bytes.GetHashCode() + " " + another.GetHashCode() + " | " + astring.GetHashCode() + " " + bstring.GetHashCode());
Cory Charlton
A: 

By default reference types are compared by their references, unless the Equals method for that type has been overidden.

Because you want to use the reference type as a key in a has table you should also override the GetHashCode method, so that objects that are 'equal' produce the same hash code.

A hash table stores objects by computing the hash using the GetHashCode method, and any later 'hits' are calculated using this. You can do this by basing the value returned by GetHasshCode on each of the properties of the object, in your case each of the bytes in the array. This is an example of where I used it you can also do this in an IEqualityComparer which you could use in your hashtable:

 public override int GetHashCode() {
        int hash = 17;
  hash = hash * 23 + DrillDownLevel.GetHashCode();
  hash = hash * 23 + Year.GetHashCode();

  if (Month.HasValue) {
    hash = hash * 23 + Month.Value.GetHashCode();
  }

  if (Week.HasValue) {
    hash = hash * 23 + .Week.Value.GetHashCode();
  }

  if (Day.HasValue) {
    hash = hash * 23 + obj.Day.Value.GetHashCode();
  }

  return hash;
}
theringostarrs
+3  A: 

Here's a sample implementation:

  class Program {
    static void Main(string[] args) {
      byte[] bytes = new byte[] { 1, 2, 5, 0, 6 };
      byte[] another = new byte[] { 1, 2, 5, 0, 6 };

      Hashtable ht = new Hashtable(new ByteArrayComparer());
      ht.Add(bytes, "hi");
      System.Diagnostics.Debug.Assert(ht.ContainsKey(another));
    }

    private class ByteArrayComparer : IEqualityComparer {
      public int GetHashCode(object obj) {
        byte[] arr = obj as byte[];
        int hash = 0;
        foreach (byte b in arr) hash ^= b;
        return hash;
      }
      public new bool Equals(object x, object y) {
        byte[] arr1 = x as byte[];
        byte[] arr2 = y as byte[];
        if (arr1.Length != arr2.Length) return false;
        for (int ix = 0; ix < arr1.Length; ++ix)
          if (arr1[ix] != arr2[ix]) return false;
        return true;
      }
    }
  }

You should use a stronger hash if you put thousands of arrays in the hash table. Check this post for an example.

Hans Passant
That's exactly what I wanted to see, but wouldn't a base64 conversion of the byte arrays be more efficient in the `Equals(object x, object y)` method, rather than looping through each item in the array (especially if the arrays are several dozen kilobytes in size)?
Ash
No, Convert.ToBase64String internally loops through each item in the array to generate the string. Then another loop is needed to compare the strings. It is code you can't see but will be quite a bit slower.
Hans Passant
Ok, so according to my single lookup test, the above implementation executes in 0.4ms, while doing a ` if (Convert.ToBase64String(arr1) != Convert.ToBase64String(arr2)) return false;` executes in 0.3ms. Not trying to nick pick, but as I have stated before (and while it may seem crazy), literally every part millisecond counts for this app. I calculated that over the lifetime of this app, a 1ms saving will several dozen HOURS off the execution of the application. But thanks so much for this, it's been an amzing help (unless you think that the Base64 really will be bad for some reason).
Ash
That doesn't make any sense, but use what you know works best. GetHashCode() is the most important one for speed, test with real data.
Hans Passant