views:

315

answers:

6

With a TreeMap it's trivial to bypass the keys' natural ordering using a Comparable. HashMaps however cannot be controlled in this manner.

I suspect it would be both easy and useful to design an interface and to retrofit this into HashMap (or a new class)? Something like this, except with better names:

  interface Hasharator<T> {
    int alternativeHashCode(T t);
    boolean alternativeEquals(T t1, T t2);
  }

  class HasharatorMap<K, V> {
    HasharatorMap(Hasharator<? super K> hasharator) { ... }
  }

  class HasharatorSet<T> {
    HasharatorSet(Hasharator<? super T> hasharator) { ... }
  }

The case insensitive Map problem gets a trivial solution:

 new HasharatorMap(String.CASE_INSENSITIVE_EQUALITY);

Would this be doable, or can you see any fundamental problems with this approach?

Is the approach used in any existing (non-JRE) libs? (Tried google, no luck.)

EDIT: Nice workaround presented by hazzen, but I'm afraid this is the workaround I'm trying to avoid... ;)

EDIT: Changed title to no longer mention "Comparator"; I suspect this was a bit confusing.

EDIT: Accepted answer with relation to performance; would love a more specific answer!

EDIT: There is an implementation; see the accepted answer below.

+1  A: 

Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.

You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.

With all that being said, if you just want your hashing to ignore the case of a String, you can write a wrapper class around String for this purpose and insert those in your data structure instead.

A simple implementation:

public class LowerStringWrapper {
    public LowerStringWrapper(String s) {
        this.s = s;
        this.lowerString = s.toLowerString();
    }

    // getter methods omitted

    // Rely on the hashing of String, as we know it to be good.
    public int hashCode() { return lowerString.hashCode(); }

    // We overrode hashCode, so we MUST also override equals. It is required
    // that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
    // restore that invariant.
    public boolean equals(Object obj) {
        if (obj instanceof LowerStringWrapper) {
            return lowerString.equals(((LowerStringWrapper)obj).lowerString;
        } else {
            return lowerString.equals(obj);
        }
    }

    private String s;
    private String lowerString;
}
hazzen
A: 

.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).

In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...

But yes, basically the idea is sound.

Jon Skeet
+1  A: 

good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.

Hmm. Maybe it's because you miss out on the opportunity to cache the calculated hash codes..
volley
A: 

I suspect this has not been done because it would prevent hashCode caching?

I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).

(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)

volley
+2  A: 

This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.

This contrasts with a balanced binary search tree (TreeMap) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.

It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.

In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.

Daniel Spiewak
I don't think your answer has much to do with the question. volley just wants to use a HashTable where the hash function is user-specified, instead of the default Object.hashCode().
Adam Rosenfield
No, I think he wants a little more than that. His proposed "solution" is to impose ordering using an alternative hash code, but that's not going to work (hashing into a limited domain). To order a hashtable, some ancillary structure is needed.
Daniel Spiewak
Hmm actually I think Adam is right; note that the interface I suggest contains one method for calculating the hash and one method for checking if two objects are equal. Ordering is not in there! The Comparator is mentioned as an analogy. (By the way, löve the Darwinian logo, Daniel!)
volley
A: 

Trove4j has the feature I'm after and they call it hashing strategies.

Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.

volley