views:

819

answers:

8

Is there any reason why a Java string cannot be tested for equality using it's hashCode method? So basically, instead of....

"hello".equals("hello")

You could use...

"hello".hashCode() == "hello".hashCode()

This would be useful because once a string has calculated it's hashcode then comparing a string would be as efficient as comparing an int as the string caches the hashcode and it is quite likely that the string is in the string pool anyway, if you designed it that way.

+18  A: 

because: hashCodes of two objects must be equal if the objects are equal, however, if two objects are unequal, the hashCode can still be equal.

(modified after comment)

+1, hashcodes don't guarantee uniqueness - they only try and provide low collisions
orip
+1, however one small ammendment. hashCodes of two objects MUST be equal if the objects are equal, this is specified in the contract of hashCode.
Falaina
Short and correct answer. Also doing something like that just to be as efficient as possible sounds a bit like premature optimization to me.
NickDK
This would be a problem if you were testing against user input for example. If you were testing strings in a controlled environment and could guarantee no collisions (unit tests) then it would be worth it if it gave marked performance improvements but as kdgregory pointed out above it would not.Cheers!
+1  A: 

The hashCode value isn't unique, which means the Strings may not actually match. To improve performance, often implementations of equals will perform a hashCode check before performing more laborious checks.

Jim Rush
A: 

There is no reason not to use hashCode as you describe.

However, you must be aware of collisions. There is a chance - a small chance admittedly - that two different strings do hash to the same value. Consider doing a hashCode at first, and if equal also do the full comparison using the equals().

Will
So in other words, there is a reason to not use hashcode as he describes?
matt b
so there's no reason to dismiss it out of hand, only a warning to do additional checks in some circumstances
Will
So if you still have to do the equals test, why bother? You've just written twice as much code to get to the same place.
Jay
if the assumption is that the strings are unlikely to be similiar, you'd get the efficiency that the asker queried.
Will
I should say that back when I was writing demo apps for early j2me phones, for the manufacturers, and performance even of this kind of thing really mattered, using hashCode() to cheapen comparisons is the kind of thing we did all the time. You need profiling to back it up, but its worth exploring in the most performance-critical code.
Will
A: 

Very simple reason: risk of collisions... A hash code will have a lot less possible values than a string. It depends a bit of the kind of hash you generate but let's take a very simple example, where you would add the ordinal values of letters, multiplied with it's position: a=1, b=2, etc. Thus, 'hello' would translate to: h: 8x1=8, e: 5x2=10, l: 12x3=36, l: 12x4=48, o: 15x5=75. 8+10+36+48+75=177.

Are there other string values that could end as 177 hashed? Of course! Plenty of options. Feel free to calculate a few.

Still, this hashing method used a simple method. Java and .NET use a more complex hashing algorithm with a lot smaller chance of such collisions. But still, there's a chance that two different strings will result in the same hash value, thus this method is less reliable.

Workshop Alex
zzzda would also equal 177. (1x26, 2x26, 3x26, 4x4, 5x1)
Workshop Alex
+10  A: 

Let me give you a counter example. Try this,

public static void main(String[] args) {
 String str1 = "0-42L";
 String str2 = "0-43-";

 System.out.println("String equality: " + str1.equals(str2));
 System.out.println("HashCode eqauality: " + (str1.hashCode() == str2.hashCode()));
}

The result on my Java,

String equality: false
HashCode eqauality: true
ZZ Coder
A: 

You can get the effect you want using String.intern() (which is implemented using a hash table.)

You can compare the return values of intern() using the == operator. If they refer to the same string then the original strings were equivalent (i.e. equals() would have returned true), and it requires only a pointer comparison (which has the same cost as an int comparison.)

String a = "Hello";
String b = "Hel" + "lo";

System.out.println(a.equals(b));
System.out.println(a == b);

String a2 = a.intern();
String b2 = b.intern();

System.out.println(a2.equals(b2));
System.out.println(a2 == b2);

Output:

true
false
true
true
finnw
+2  A: 

as many said hashCode does not guaranty uniqueness. in fact, it cannot do that for a very simple reason.

hashCode returns an int, which means there are 2^32 possible values (around 4,000,000,000), but there are surely more than 2^32 possible strings, which means at least two strings have the same hashcode value.

this is called Pigeonhole principle.

Omry
good answer, could have done with knowing about pigeonhole principle in a recent interview
Karl
+1  A: 

Others have pointed out why it won't work. So I'll just add the addendum that the gain would be minimal anyway.

When you compare two strings in Java, the String equals function first checks if they are two references to the same object. If so, it immediately returns true. Then it checks if the lengths are equal. If not, it returns false. Only then does it start comparing character-by-character.

If you're manipulating data in memory, the same-object compare may quickly handle the "same" case, and that's a quick, umm, 4-byte integer compare I think. (Someone correct me if I have the length of an object handle wrong.)

For most unequal strings, I'd bet the length compare quickly finds them not equal. If you're comparing two names of things -- customers, cities, products, whatever -- they'll usually have unequal length. So a simple int compare quickly disposes of them.

The worst case for performance is going to be two long, identical, but not the same object strings. Then it has to do the object handle compare, false, keep checking. The length compare, true, keep checking. Then character by character through the entire length of the string to verify that yes indeed they are equal all the way to the end.

Jay
excellent answer
Karl