tags:

views:

411

answers:

1

I stumbled across this problem when creating a HashSet[Array[Byte]] to use in a kind of HatTrie.

Apparently, the standard equals() method on arrays checks for identity. How can I provide the HashSet with an alternative Comparator that uses .deepEquals() for checking if an element is contained in the set?

Basically, I want this test to pass:

describe ("A HashSet of Byte Array") {   

    it("must contain arrays that are equivalent to one that has been added") {
        val set = new HashSet[Array[Byte]]()
        set += "ab".getBytes("UTF-8")
        set must contain ("ab".getBytes("UTF-8"))     
    }
}

I cannot feasibly wrap the Array[Byte] into another object because there's a lot of them. Short of writing a new HashSet implementation for this purpose is there anything I can do?

+1  A: 

Mutable data structures, such as Arrays, are contra-indicated for usage in places where the hash code is used. This is because the data structure can change, thus changing the hash code of the data, thus making access to the data inaccurate.

For instance, let's say I have a binary tree to store elements based on their hash code. If the hash is even, I store the data on the left side, if odd on the right side. Then I divide the hash by two, and repeat the process until the hash is 0, at which point I store the data in the node.

Now, I use this structure as base for HashSet, and then store an array on it. The array has an even hash code, so it goes to the left side of the tree. Let's ignore it's exact position.

Later, I change the array, and then look it up on the set. Now the hash code is odd, and I go look on the right side of the tree, and thus can't find it, even though it is stored int he tree -- just on the other side.

So, don't use arrays with hash-based collections. Which doesn't answer your question, of course.

As for your question, you'd have to subclass HashSet, and then override the equals method. I don't know if HashSet is final or descendent from a sealed class, so I don't know if this is viable.

Another option would be creating an alternate comparision method -- not named equals or "==", based specifically on deepEquals, and then using the Pimp My Class method to add it to HashSet.

Edit

I did mean subclass HashSet, but I did not pay enough attention to the question. I thought you were comparing the entire HashSet, instead of just using contains. You could do this:

class MyHashSet[A] extends scala.collection.mutable.HashSet[A] {
  override def contains(elem: A): Boolean = elem match {
    case arr : Array[_] => this.elements exists (arr deepEquals _)
    case _ => super.contains(elem)
  }
}

This isn't actually working here, as the first case is not being followed. I'm really lost here, as simple tests on REPL seems to indicate it ought to work. I'm thinking it might have something to do with boxing, but I'm not real clear on what -- or I'd have it working. :-)

Daniel
You are of course right about the dangerous combination of mutable data structures and containers that depend on ordering. I was just trying this in a prototype and was wondering if a quick way to make it work was available. This does not seem to be the case. I guess the correct solution would then be to create a class that implements an immutable byte array with an equals method that does what I need.
Johannes Stiehler
Btw, reading your answer again, I'm wondering if you actually meant to say "subclass byte array" since subclassing HashSet would not help with the equals method (neither would Pimp My Class on HashSet).
Johannes Stiehler