views:

4280

answers:

9

How do you properly override isEqual: in Objective-C? The "catch" seems to be that if two objects are equal (as determined by the isEqual: method), they must have the same hash value.

The Introspection section of the Cocoa Fundamentals Guide does have an example on how to override isEqual:, copied as follows, for a class named MyWidget:

- (BOOL)isEqual:(id)other {
    if (other == self)
        return YES;
    if (!other || ![other isKindOfClass:[self class]])
        return NO;
    return [self isEqualToWidget:other];
}

- (BOOL)isEqualToWidget:(MyWidget *)aWidget {
    if (self == aWidget)
        return YES;
    if (![(id)[self name] isEqual:[aWidget name]])
        return NO;
    if (![[self data] isEqualToData:[aWidget data]])
        return NO;
    return YES;
}

It checks pointer equality, then class equality, and finally compares the objects using isEqualToWidget:, which only checks the name and data properties. What the example doesn't show is how to override hash.

Let's assume there are other properties that do not affect equality, say age. Shouldn't the hash method be overridden such that only name and data affect the hash? And if so, how would you do that? Just add the hashes of name and data? For example:

- (NSUInteger)hash {
    NSUInteger hash = 0;
    hash += [[self name] hash];
    hash += [[self data] hash];
    return hash;
}

Is that sufficient? Is there a better technique? What if you have primitives, like int? Convert them to NSNumber to get their hash? Or structs like NSRect?

(Brain fart: Originally wrote "bitwise OR" them together with |=. Meant add.)

+8  A: 

The easy but inefficient way is to return the same -hash value for every instance. Otherwise, yes, you must implement hash based only on objects which affect equality. This is tricky if you use lax comparisons in -isEqual: (e.g. case-insensitive string comparisons). For ints, you can generally use the int itself, unless you’ll be comparing with NSNumbers.

Don’t use |=, though, it will saturate. Use ^= instead.

Random fun fact: [[NSNumber numberWithInt:0] isEqual:[NSNumber numberWithBool:NO]], but [[NSNumber numberWithInt:0] hash] != [[NSNumber numberWithBool:NO] hash]. (rdar://4538282, open since 05-May-2006)

Ahruman
You're exactly right on the |=. Didn't really mean that. :) += and ^= are fairly equivalent.How do you handle non-integer primitives like double and float?
Dave Dribin
Random fun fact: Test it on Snow Leopard... ;-)
Quinn Taylor
He's correct about using XOR instead of OR for combining fields into a hash. However, don't use the advice of returning the same -hash value for every object — although it easy, it can severely degrade the performance of *anything* that uses the object's hash. The hash does not *have* to be distinct for objects that aren't equal, but if you can achieve that, there's nothing like it.
Quinn Taylor
+4  A: 

I've found this page to be a helpful guide in override equals- and hash-type methods. It includes a decent algorithm for calculating hash codes. The page is geared towards Java, but it's pretty easy to adapt it to Objective-C/Cocoa.

mipadi
RIP Geocities...
John Cromartie
cached link via archive.org: http://web.archive.org/web/20071013053633/http://www.geocities.com/technofundo/tech/java/equalhash.html
cobbal
+3  A: 

This doesn't directly answer your question (at all) but I've used MurmurHash before to generate hashes: http://murmurhash.googlepages.com

Guess I should explain why: murmurhash is bloody fast...

schwa
A C++ library that is focused on unique hashes for a void* key using random number (and also doesn't relate to Objective-C objects) really isn't a helpful suggestion here. The -hash method should return a consistent value each time, or it will be utterly useless. If the object is added to a collection that calls -hash, and returns a new value every time, duplicates will never be detected, and you can never retrieve the object from the collection either. In this case, the term "hash" is different from the meaning in security/cryptography.
Quinn Taylor
murmurhash is isn't a cryptographic hash function. Please check your facts before posting incorrect information.Murmurhash _could_ be useful for hashing custom objective-c classes (esp. if you have a lot of NSDatas involved) because it is extremely fast.However I do grant you that maybe suggesting it isn't the best advice to give to someone "just picking up objective-c", but please note my prefix on my original reply to the question.
schwa
+14  A: 

Start with

 int prime = 31;
 int result = 1;

Then for every primitive you do

 result = prime * result + var

For 64bit you might also want to shift and xor.

 result = prime * result + (int) (var ^ (var >>> 32));

For objects you use 0 for nil and otherwise their hashcode.

 result = prime * result + [var hash];

For booleans you use two different values

 result = prime * result + (var)?1231:1237;
tcurdt
I like this a lot. FWIW, the result is an NSUInteger which is 64 bits wide on x86-64, so you may not want to shift 64-bit ivars.
Dave Dribin
Where did the 1231:1237 come from? I see it in Java's Boolean.hashCode() too. Is it magical?
David Leonard
Thanks for the hash algorithm, it was very useful!
Leibowitzn
+6  A: 

I'm just picking up Objective-C myself, so I can't speak for that language specifically, but in the other languages I use if two instances are "Equal" they must return the same hash - otherwise you are going to have all sorts of problems when trying to use them as keys in a hashtable (or any dictionary-type collections).

On the other hand, if 2 instances are not equal, they may or may not have the same hash - it is best if they don't. This is the difference between an O(1) search on a hash table and an O(N) search - if all your hashes collide you may find that searching your table is no better than searching a list.

In terms of best practices your hash should return a random distribution of values for its input. This means that, for example, if you have a double, but the majority of your values tend to cluster between 0 and 100, you need to make sure that the hashes returned by those values are evenly distributed across the entire range of possible hash values. This will significantly improve your performance.

There are a number of hashing algorithms out there, including several listed here. I try to avoid creating new hash algorithms as it can have large performance implications, so using the existing hash methods and doing a bitwise combination of some sort as you do in your example is a good way to avoid it.

Brian B.
+1 Excellent answer, deserves more upvotes, especially since he actually talks about "best practices" and the theory behind why a good (unique) hash is important.
Quinn Taylor
+1  A: 

Note that if you're creating a object that can be mutated after creation, the hash value must not change if the object is inserted into a collection. Practically speaking, this means that the hash value must be fixed from the point of the initial object creation. See Apple's documentation on the NSObject protocol's -hash method for more information:

If a mutable object is added to a collection that uses hash values to determine the object’s position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object’s internal state information or you must make sure the object’s internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)

This sounds like complete whackery to me since it potentially effectively renders hash lookups far less efficient, but I suppose it's better to err on the side of caution and follow what the documentation says.

You're reading the hash docs wrong — it's essentially an "either-or" situation. If the object changes, the hash generally also changes. This is really a warning to the programmer, that if the hash changes as a result of mutating an object, then changing the object while it resides in a collection that utilizes the hash will cause unexpected behavior. If the object must be "safely mutable" in such a situation, you have no choice but to make the hash unrelated to the mutable state. That particular situation does sound strange to me, but there are certainly infrequent situations where it applies.
Quinn Taylor
+2  A: 

I'm an Objective C newbie too, but I found an excellent article about identity vs. equality in Objective C here. From my reading it looks like you might be able to just keep the default hash function (which should provide a unique identity) and implement the isEqual method so that it compares data values.

ceperry
I'm a Cocoa/Objective C newbie, and this answer and link really helped me cut through all the more advanced stuff above to the bottom line - I don't need to worry about hashes - just implementing the isEqual: method. Thanks!
John Gallagher
A: 

Quinn is just wrong that the reference to the murmur hash is useless here. Quinn is right that you want to understand the theory behind hashing. The murmur distills a lot of that theory into an implementation. Figuring out how to apply that implementation to this particular application is worth exploring.

Some of the key points here:

The example function from tcurdt suggests that '31' is a good multiplier because it is prime. One needs to show that being prime is a necessary and sufficient condition. In fact 31 (and 7) are probably not particularly good primes because 31 == -1 % 32. An odd multiplier with about half the bits set and half the bits clear is likely to be better. (The murmur hash multiplication constant has that property.)

This type of hash function would likely be stronger if, after multiplying, the result value were adjusted via a shift and xor. The multiplication tends to produce the results of lots of bit interactions at the high end of the register and low interaction results at the bottom end of the register. The shift and xor increases the interactions at the bottom end of the register.

Setting the initial result to a value where about half the bits are zero and about half the bits are one would also tend to be useful.

It may be useful to be careful about the order in which elements are combined. One should probably first process booleans and other elements where the values are not strongly distributed.

It may be useful to add a couple of extra bit scrambling stages at the end of the computation.

Whether or not the murmur hash is actually fast for this application is an open question. The murmur hash premixes the bits of each input word. Multiple input words can be processed in parallel which helps multiple-issue pipelined cpus.

A: 

Sorry if I risk sounding a complete boffin here but... ...nobody bothered mentioning that to follow 'best practices' you should definitely not specify an equals method that would NOT take into account all data owned by your target object, e.g whatever data is aggregated to your object, versus an associate of it, should be taken into account when implementing equals. If you don't want to take, say 'age' into account in a comparison, then you should write a comparator and use that to perform your comparisons instead of isEqual:.

If you define an isEqual: method that performs equality comparison arbitrarily, you incur the risk that this method is misused by another developer, or even yourself, once you've forgotten the 'twist' in your equals interpretation.

Ergo, although this is a great q&a about hashing, you don't normally need to redefine the hashing method, you should probably define an ad-hoc comparator instead.

Thibaud de Souza