tags:

views:

142

answers:

5

I have list of a an object which is termed as rule in our case, this object itself is a list of field for which I have to do hashcode comparison as we can't duplicate rule in the system.

i.e Let say I have two Rules R1 and R2 with fields A & B.

Now if values of A & B in R1 are 7 and 2 respectively.

And in R2 it's 3 and 4 respectively then the process I have used to check the duplicity of Rules in the system that is hashcode comparison fails

the method which I have used is

for(Rule rule : rules){
changeableAttrCode=0;

fieldCounter=1;

attributes = rule.getAttributes();

for(RuleField ruleField : attributes){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}
parameters = rule.getParameters();

for(RuleField ruleField : parameters){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}

changeableAttrCodes.add(changeableAttrCode);

here changeableAttrCodes where we store the hashcode of all the rules.

so can please suggest me better method so that this kind of problem does not arise in future as well as duplicity of rules in system can be seen.

Thanks in advance

+3  A: 

Updated Your hashing algorithm is not producing a good spread of hash values - it gives the same value for (7, 2) and (3, 4):

1 * 7 + 2 * 2 = 11
1 * 3 + 2 * 4 = 11

It would also give the same value for (11, 0), (-1, 6), ... and one can trivially make up an endless number of similar equivalence classes based on your current algorithm.

Of course you can not avoid collisions - if you have enough instances, hash collision is inevitable. However, you should aim to minimize the chance for collisions. Good hashing algorithms strive to spread hash values equally over a wide range of values. A typical way to achieve this is to generate the hash value for an object containing n independent fields as an n-digit number with a base big enough to hold the different hash values for the individual fields.

In your case, instead of multiplying with fieldCounter you should multiply with a prime constant, e.g. 31 (that would be the base of your number). And add another prime constant to the result, e.g. 17. This gives you a better spread of hash values. (Of course the concrete base depends on what values can your fields take - I have no info about that.)

Also if you implement hashCode, you are strongly advised to implement equals as well - and in fact, you should use the latter to test for equality.

Here is an article about implementing hashCode.

Péter Török
@polygenelubricants see my update.
Péter Török
+1  A: 

I don't understand what you are trying to do here. With most hash function scenarios, collision is inevitable, because there are way more objects to hash than there are possible hash values (it's a pigeonhole principle).

It is generally the case that two different objects may have the same hash value. You cannot rely on hash functions alone to eliminate duplicates.

Some hash functions are better than others in minimizing collisions, but it's still an inevitability.


That said, there are some simple guidelines that usually gives a good enough hash function. Joshua Bloch gives the following in his book Effective Java 2nd Edition:

  • Store some constant nonzero value, say 17, in an int variable called result.
  • Compute an int hashcode c for each field:
    • If the field is a boolean, compute (f ? 1 : 0)
    • If the field is a byte, char, short, int, compute (int) f
    • If the field is a long, compute (int) (f ^ (f >>> 32))
    • If the field is a float, compute Float.floatToIntBits(f)
    • If the field is a double, compute Double.doubleToLongBits(f), then hash the resulting long as in above.
    • If the field is an object reference and this class's equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If the value of the field is null, return 0.
    • If the field is an array, treat it as if each element is a separate field. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
  • Combine the hashcode c into result as follows: result = 31 * result + c;
polygenelubricants
@polygenelubricants: Thank you for detailed answer. Could you please tell me, which reference/guidelines you used to create this set of advices? Also I wonder, why multiply by 31? What is the magic in 31 (all bits raised)? Why multiply is better then right shift (`result = result <<< 16 + c`)?
dma_k
@dma_k: I'm quoting _Effective Java 2nd Edition_, which claim that this formula is good enough in practice, without going into the mathematics. 31 is good because it's an odd prime. Also, since it's one less than a power of two, it may also be optimized to shift and subtract at the low level.
polygenelubricants
@polygenelubricants: Thanks for the reference. Actually, _Effective Java_ mentions number 37, which is also a prime (primes can't be even ;). I know about the optimization when we multiply by power of 2 (can be replaced by left shift), but you are right, the answer is here: http://stackoverflow.com/questions/1074530/efficient-hashcode-implementation
dma_k
@dma_k: 2 is an even prime.
polygenelubricants
+5  A: 

hashcode() is not meant to be used to check for equality. return 42; is a perfectly valid implementation of hashcode(). Why don't you overwrite equals() (and hashcode() for that matter) in the rules objects and use that to check whether two rules are equal? You could still use the hashcode to check which objects you need to investigate, since two equal() objects should always have the same hashcode, but that is a performance improvement that you may or may not need, depending on your system.

Thomas Lötzer
+4  A: 
  • Implement hashCode and equals in class Rule.
  • Implementation of equals has to compare its values.

Then use a HashSet<Rule> and ask if(mySet.contains(newRule))

HashSet + equals implementation solves the problem of the non-uniqueness of the hash. It uses hash for classifying and speed but it uses equals at the end to ensure that two Rules with same hash are the same Rule or not.

More on hash: if you want to do it by hand, use the prime number sudggestion, and review the JDK code for string hashcodes. If you want to make a clean implementation try to retrieve the hashcode of the elements, make some kind of array of ints and use Arrays.hashCode(int[]) to get a hashcode for the combination of them.

helios
A: 

I started to write that the only way you can achieve what you want is with Perfect Hashing.

But then I thought about the fact that you said you can't duplicate objects in your system.

Edit based on thought-provoking comment from helios:

Your solution depends on what you meant when you wrote that you "can't duplicate rules".

If you meant that literally you cannot, that there is guaranteed to be only one instance of a rule with a particular set of values, then your problem is trivial: you can do identity comparison, in which case you can do identity comparison using ==.

On the other hand, you meant that you shouldn't for some reason (performance), then your problem is also trivial: just do value comparisons.

Given the way you've defined your problem, under no circumstances should you be considering the use of hashcodes as a substitute for equality. As others have noted, hashcodes by their nature yield collisions (false equality), unless you go to a Perfect Hashing solution, but why would you in this case?

CPerkins
He said "I can't duplicate" in the sense "I must not", not in the sense "I'm forced to not duplicate by the running environment". So he must find a way to achieve value-level-uniqueness knowing he can fall into the not-wanted-but-physically-posible instance duplication.
helios
@helios - first, unless you've heard something else from him than is written in his question, there's nothing to support your interpretation of the word "can't" - he literally said "we can't duplicate rule in the system". Second, if you are right, his question is completely silly. Why would he even think about duplicating in order to do value comparisons? Why not just do value comparisons? Hashcodes are absolutely not the way to go for that. But thanks for making me think about this again.
CPerkins
@CPerkins I assumed he's using the hashcode the same way the `java.util.HashMap` uses it to find a Key. The hash is for finding a bucket, and then, if there are keys in that bucket uses equals for comparison. For me is completely valid creating new instances for a key that's already in a map (or another structure), and looking for that key to replace the entry or add a new one (and much more performant). The problem is he needs, aside from a good hash impl, using equals between the same-hashode ítems.
helios
@helios - Ah, interesting. Well, that interpretation makes more sense: hashcode as a quick shortcut. It's still the wrong approach, until he's tried it with simple value equality and found the performance to be unacceptable, but at least it makes more sense. Thanks.
CPerkins
Thanks for all the comments Perkins got it absolutly right I have to use equals method when hashcode comes out to be same.
Abhi