ansaurus

Question

Generate a hash sum for several integers

Answer 1

A:

You would need to define what you mean by "similar". Hashes are generally designed to create unique results from unique input.

One approach would be to normalize your input and then generate a hash from the results.

Andrew Grant 2009-02-11 23:22:01

What you mean with normalize?

Eduardo 2009-02-11 23:38:55

Answer 2

+2 A:

Given the inputs a, b, c, and d, each ranging in value from 0 to 30 (5 bits), the following will produce an number in the range of 0 to 255 (8 bits).

bucket = ((a & 0x18) << 3) | ((b & 0x18) << 1) | ((c & 0x18) >> 1) | ((d & 0x18) >> 3)

Whether the general approach is appropriate depends on how the question is interpreted. The 3 least significant bits are dropped, grouping 0-7 in the same set, 8-15 in the next, and so forth.

0-7,0-7,0-7,0-7 -> bucket 0
0-7,0-7,0-7,8-15 -> bucket 1
0-7,0-7,0-7,16-23 -> bucket 2
...
24-30,24-30,24-30,24-30 -> bucket 255

Trivially tested with:

for (int a = 0; a <= 30; a++)
    for (int b = 0; b <= 30; b++)
        for (int c = 0; c <= 30; c++)
            for (int d = 0; d <= 30; d++) {
                int bucket = ((a & 0x18) << 3) |
                             ((b & 0x18) << 1) |
                             ((c & 0x18) >> 1) |
                             ((d & 0x18) >> 3);
                printf("%d, %d, %d, %d -> %d\n",
                         a,  b,  c,  d,   bucket);
            }

Ryan Graham 2009-02-11 23:28:20

Hi that method is nice but the problem is that I need the same buckets for similar states, for example 11,10,5,4 and 11,10,5,3, should have the same bucket. I cannot have 15, 15, 15, 15 and 0, 0, 0, 0 with the same bucket. Maybe I should open a new more clear quetion this one is totally messy now.

Eduardo 2009-02-13 22:28:54

It is getting a little comical, ya... I can add another example now that you've specified your ranges though.

Ryan Graham 2009-02-13 23:07:12

Did you test that?, it does not work, this is part of the distribution, attending to the possible options for the attributes . 0 84 344 83 84 344 84 0 344 84 0 0 83 0 0 0 84 344 84 0 344 84 0 0 84 0 0 0 0 0 0 0 344 84 0 0 84 0 0 0 0 0 0 0 0 0 0 0 83

Eduardo 2009-02-14 01:44:25

Perhaps you made a typo? I added sample code for testing it.

Ryan Graham 2009-02-14 17:39:57

Hi, there was a little of misunderstanding, you code works fine when the attributes are from 0 to 30, BUT I have the condition that the addition of all of them is 30, when you introduce that condition then I get the output I said before.

Eduardo 2009-02-15 04:59:37

Hi, there was a little of misunderstanding, you code works fine when the attributes are from 0 to 30, BUT I have the condition that the addition of all of them is 30, when you introduce that condition then I get the output I said before.

Eduardo 2009-02-15 05:00:32

Thanks a lot Ryan for your help, I have not come up with an easy way of solve this problem but I am sure that there is a way in do what I am looking for in a very simple way.

Eduardo 2009-02-15 05:01:04

Answer 3

+2 A:

You want a hash function that depends on the order of inputs and where similar sets of numbers will generate the same hash? That is, you want 50 5 5 10 and 5 5 10 50 to generate different values, but you want 52 7 4 12 to generate the same hash as 50 5 5 10? A simple way to do something like this is:

long hash = 13;
for (int i = 0; i < array.length; i++) {
    hash = hash * 37 + array[i] / 5;
}

This is imperfect, but should give you an idea of one way to implement what you want. It will treat the values 50 - 54 as the same value, but it will treat 49 and 50 as different values.

If you want the hash to be independent of the order of the inputs (so the hash of 5 10 20 and 20 10 5 are the same) then one way to do this is to sort the array of integers into ascending order before applying the hash. Another way would be to replace

    hash = hash * 37 + array[i] / 5;

with

    hash += array[i] / 5;

EDIT: Taking into account your comments in response to this answer, it sounds like my attempt above may serve your needs well enough. It won't be ideal, nor perfect. If you need high performance you have some research and experimentation to do.

To summarize, order is important, so 5 10 20 differs from 20 10 5. Also, you would ideally store each "vector" separately in your hash table, but to handle space limitations you want to store some groups of values in one table entry.

An ideal hash function would return a number evenly spread across the possible values based on your table size. Doing this right depends on the expected size of your table and on the number of and expected maximum value of the input vector values. If you can have negative values as "coordinate" values then this may affect how you compute your hash. If, given your range of input values and the hash function chosen, your maximum hash value is less than your hash table size, then you need to change the hash function to generate a larger hash value.

Eddie 2009-02-12 00:45:42

What other information do you need?. The whole thing with this is that those integers are attributes for a function. I want to store the values of the function in a table, but I do not have enough memory to store all the different options. That is why I want to generalize between similar attributes.

Eduardo 2009-02-12 01:19:39

The reason why 10, 5, 15 are totally different from 5, 10, 15, it is because if you imagine this in 3d then both points are a totally different point.

Eduardo 2009-02-12 01:22:32

@Eduardo: Please edit your question to add the above information to the question itself. This will help people reading your question understand what you are asking.

Eddie 2009-02-12 01:47:37

Answer 4

A:

The simple solution:

Convert the integers to strings separated by commas, and hash the resulting string using a common hashing algorithm (md5, sha, etc).

If you really want to roll-your-own, I would do something like:

Generate large prime P
Generate random numbers 0 < a[i] < P (for each dimension you have)

To generate hash, calculate: sum(a[i] * x[i]) mod P

FryGuy 2009-02-12 02:47:40

Answer 5

A:

Generating the same hash sum is called a collision, and is a bad thing for a hash to have. It makes it less useful.

If you want similar values to give the same output, you can divide the input by however close you want them to count. If the order makes a difference, use a different divisor for each number. The following function does what you describe:

int SqueezedSum( int a, int b, int c, int d )
{
    return (a/11) + (b/7) + (c/5) + (d/3);
}

This is not a hash, but does what you describe.

Dour High Arch 2009-02-12 02:53:08

In my case the collisions if they are controlled are good, because is how I am doing the generalization in the function

Eduardo 2009-02-12 03:12:06

Answer 6

+1 A:

You might want to try using vectors to describe each number set as the hash value.

EDIT: Since you're not describing why you want to not run the function itself, I'm guessing it's long running. Since you haven't described the breadth of the argument set.

If every value is expected then a full lookup table in a database might be faster.

If you're expecting repeated calls with the same arguments and little overall variation, then you could look at memoizing so only the first run for a argument set is expensive, and each additional request is fast, with less memory usage.

sfossen 2009-02-12 15:42:40

Answer 7

A:

You want to look into geometric hashing. In "standard" hashing you want

a short key
inverse resistance
collision resistance

With geometric hashing you susbtitute number 3 with something whihch is almost opposite; namely close initial values give close hash values.

David Lehavi 2009-02-12 20:05:58

Answer 8

A:

Hello,

Another way to view my problem is using the multidimesional scaling (MS). In MS we start with a matrix of items and what we want is assign a location of each item to an N dimensional space. Reducing in this way the number of dimensions.

http://en.wikipedia.org/wiki/Multidimensional_scaling

Eduardo 2009-02-18 18:46:25

ansaurus

tags:

views:

answers:

Generate a hash sum for several integers

related questions