ansaurus

Question

Answer 1

+3 A:

Summing the elements is already one of the simplest things you could do. But I don't think it's a particularly good hash function w.r.t. pseudo randomness.

If you sort your arrays before storing them or computing hashes, every good hash function will do.

If it's about speed: Have you measured where the bottleneck is? If your hash function is giving you a lot of collisions and you have to spend most of the time comparing the arrays bit-by-bit the hash function is obviously not good at what it's supposed to do. Sorting + Better Hash might be the solution.

sellibitze 2009-10-08 08:28:59

Answer 2

A:

depending if you have a lot of collisions (so the same hash but not a permutation), you might presort the arrays while hashing them. In that case you can do a more aggressive kind of hashing where you don't only add up the numbers but add some bitmagick to it as well to get quite different hashes.

This is only beneficial if you get loads of unwanted collisions because the hash you are doing now is too poor. If you hardly get any collisions, the method you are using seems fine

Toad 2009-10-08 08:31:58

Answer 3

A:

I like using string's default hash code (Java, C# not sure about other languages), it generates pretty unique hash codes. so if you first sort the array, and then generates a unique string using some delimiter.

so you can do the following (Java):

    int[] arr = selectRandomNumbers();
    Arrays.sort(arr);
    int hash = (arr[0] + "," + arr[1] + "," + arr[2] + "," + arr[3]).hashCode();

if performance is an issue, you can change the suggested inefficient string concatenation to use StringBuilder or String.format

   String.format("{0},{1},{2},{3}", arr[0],arr[1],arr[2],arr[3]);

String hash code of course doesn't guarantee that two distinct strings have different hash, but considering this suggested formatting, collisions should be extremely rare

LiorH 2009-10-08 09:07:04

thanks for the vote down :-). I tried to suggest an alternative solution (this is what's this site is about), dear voter if you could elaborate what's wrong with my suggestion, this will make this post more productive.

LiorH 2009-10-08 09:26:05

Perhaps whoever voted you down had this in mind: http://stackoverflow.com/questions/1465621/testing-string-equality-using-hashcode/1465719#1465719

Michael Foukarakis 2009-10-08 09:36:52

I have a suspicion that it may have been a stray click from me. I actually think your solution is pretty good. I'm new round here and by the time I figured it out SO wouldn't let me undo it (I tried). If you edit your post even trivially it looks like I can fix it. Sorry.

Paul Arnold 2009-10-08 09:43:43

once the array is sorted, there is no need to build a string and calculate a hash. This would be extremely slow, there are plenty of good hash functions that you can use with a sorted array.

martinus 2009-10-09 09:27:05

Answer 4

A:

I would suggest this: 1. Check if the lengths of permutations are the same (if not - they are not equal)

Sort only 1 array. Instead of sorting another array iterate through the elements of the 1st array and search for the presence of each of them in the 2nd array (compare only while the elements in the 2nd array are smaller - do not iterate through the whole array).

note: if you can have the same numbers in your permutaions (e.g. [1,2,2,10]) then you will need to remove elements from the 2nd array when it matches a member from the 1st one.

pseudo-code:

if length(arr1) <> length(arr2) return false;
sort(arr2);
for i=1 to length(arr1) {
elem=arr1[i];
j=1;
while (j<=length(arr2) and elem<arr2[j]) j=j+1;
if elem <> arr2[j] return false;
}
return true;

the idea is that instead of sorting another array we can just try to match all of its elements in the sorted array.

DmitryK 2009-10-08 09:51:42

Answer 5

A:

You can probably reduce the collisions a lot by using the product as well as the sum of the terms.

1*10*3*18=540 and 10*18*3*1=540

so the sum-product hash would be [32,540]

you still need to do something about collisions when they do happen though

gnibbler 2009-10-08 10:58:11

Answer 6

+4 A:

One potential candidate might be this. Fix a odd integer R. For each element e you want to hash compute the factor (R + 2*e). Then compute the product of all these factors. Finally divide the product by 2 to get the hash.

The factor 2 in (R + 2e) guarantees that all factors are odd, hence avoiding that the product will ever become 0. The division by 2 at the end is because the product will always be odd, hence the division just removes a constant bit.

E.g. I choose R = 1779033703. This is an arbitrary choice, doing some experiments should show if a given R is good or bad. Assume your values are [1, 10, 3, 18]. The product (computed using 32-bit ints) is

(R + 2) * (R + 20) * (R + 6) * (R + 36) = 3376724311

Hence the hash would be

3376724311/2 = 1688362155.

abc 2009-10-08 11:22:51

thanks! I have benchmarked your hash, see the update

martinus 2009-10-09 08:52:42

Nice. I've been looking for a mathematical criterion to select a good R, but didn't find anything useful. But I guess as long as arbitrary values are good enough there is no need to do much theory.

abc 2009-10-09 12:45:03

I think the golden ratio might be a good choice (2654435769 for 32 bit values), but this is just a wild guess.http://brpreiss.com/books/opus4/html/page214.html

martinus 2009-10-09 15:05:29

That looks like a reasonable choice to me. I also found a small weakness of this hash function. The most significant bits of the input only affect the most significant bits of the output. Hence if this function is used to hash data where the least significant bits are constant, then one would expect a lot of hash collisions.

abc 2009-10-12 11:42:46

Answer 7

A:

I am not sure about the language you are using, But in general instead of summing up the whole array, you can convert (join) them into string and compare it.. this should work

1-10-3-18 != 10-18-3-1

In c#

if (string.Join("_", array1) == string.Join("_", array2))

in javascript

if (array1.join("_")==array2.join("_"))

Cheers

Ramesh Vel

Ramesh Vel 2009-10-08 12:07:07

ansaurus

tags:

views:

answers:

Good hash function for permutations?

related questions