views:

126

answers:

4

Hi,

I am wondering if there is a way to generate a key based on the relationship between two entities in a way that the key for relationship a->b is the same as the key for relationship b->a.

Desirably this would be a hash function which takes either relationship member but generates the same output regardless of the order the members are presented in.

Obviously you could do this with numbers (e.g. add(2,3) is equivalent to add(3,2)). The problem for me is that I do not want add(1,4) to equal add(2,3). Obviously any hash function has overlap but I mean a weak sense of uniqueness.

My naive (and performance undesirable) thought is:

function orderIndifferentHash(string val1, string val2)
{
  return stringMerge(hash(val1), hash(val2));
  /* String merge will 'add' each character (with wrapping).
     The pre-hash is to lengthen strings to at least 32 characters */
}
+6  A: 

In your function orderIndifferentHash you could first order val1 and val2 by some criteria and then apply any hash function you like to get the result.

function orderIndifferentHash( val1, val2 ) {
  if( val1 < val2 ) {
    first = val1
    second = val2
  }
  else {
    first = val2
    second = val1
  }
  hashInput = concat( first, second )
  return someHash( hashInput )

  // or as an alternative:
  // return concat( someHash( first ), someHash( second ) )
}
tangens
Yep feel pretty stupid for not thinking to sort them. Talk about looking at this back to front.
Graphain
Tough call on which to accept, you were first and answer the question perfectly, but the other answer is more explicit with the assumptions / thought behind it (as obvious as it all is).
Graphain
This has the slight problem that `f(ABC, DEF)` has the same hash as `f(AB, CDEF)`. You should use a contatenation scheme that generates a unique output for any distinct inputs (eg prepend with the length of the first one, so that you end up hashing `3|ABCDEF` in the first case and `2|ABCDEF` in the second).
caf
+2  A: 

You you are after:

Some function f(x, y) such that

  • f(x, y) == f(y, x)
  • f(x, y) != f(a, b) => (x == a and y == b) or (x == b and y == a)

There are going to be absolutely loads of these - off hand the one I can think of is "sorted concatenation":

  1. Sort (x, y) by any ordering
  2. Apply a hash function u(a) to x and y individually (where u(a) == u(b) implies a == b, and the length of u(a) is constant)
  3. Concatenate u(x) and u(y).

In this case:

If x == y then then the two hashes are trivially the same, so without loss of generality x < y, hence:

  • f(y, x) = u(x) + u(y) = f(x, y)

Also, if f(x, y) == f(a, b), this means that either:

  • u(x) == u(a) and u(y) == u(b) => x == a and y == b, or
  • u(y) == u(a) and u(x) == u(b) => y == a and x == b

Short version:

Sort x and y, and then apply any hash function where the resulting hash length is constant.

Kragen
Yes, but this is only true if `u` is a perfect hash function, i.e. if `u(a) == u(b)` is equivalent to `a == b`.
AndreyT
Duh...sorting... I feel pretty stupid right now - so long as the two values are sortable we can always hash in the same order. Can you explain what you mean by `=>` I'm assuming "unless" (given you are saying `f(x,y) != f(a,b) => (x == a and y == b..)` but I'm used to seeing it as "therefore" (as you've used it at the bottom).
Graphain
Tough call on which to accept, another was first and answered the question perfectly, but this answer is more explicit with the assumptions / thought behind it (as obvious as it all is once you work out sorting).
Graphain
Oh sorry - `=>` reads as "implies that". Also the values will always be sortable by some mechanism, the only thing that matters is that the ordering is complete, i.e. if `a != b` then either `a < b`, or `b < a`.
Kragen
@Kragen: I would not concatenate the two hashes (it isn't even possible if the result is numerical), I would rather combine them. For an example of combination see `boost::hash_combine`: http://www.boost.org/doc/libs/1_41_0/doc/html/hash/reference.html#boost.hash_combine
Matthieu M.
If you concatenate after hashing, then it introduces some potential vulnerabilities (depending on what you are doing with the result). For example, if I see two outputs `f(x1,y1)` and `f(x2,y2)` have a common prefix or suffix then I know that one of inputs is common.
caf
+4  A: 

With numbers, one way to achieve that is for two numbers x and y take the x-th prime and y-th prime and calculate the product of these primes. That way you will guarantee the uniqueness of the product for each distinct pair of x and y and independence from the argument order. Of course, in order to do that with any practically meaningful efficiency you'll need to keep a prime table for all possible values of x and y. If x and y are chosen from relatively small range, this will work. But if range is large, the table itself becomes prohibitively impractical, and you'll have no other choice but to accept some probability of collision (like keep a reasonably sized table of N primes and select the x%N-th prime for the given value of x).

Alternative solution, already mentioned in the other answers is to build a perfect hash function that works on your x and y values and then simply concatenate the hashes for x and y. The order independence is achieved by pre-sorting x and y. Of course, building a perfect hash is only possible for a set of arguments from a reasonably small range.

Something tells me that the primes-based approach will give you the shortest possible hash that satisfies the required conditions. No, not true.

AndreyT
Neat idea. Unfortunately performance would kill my use-case but I like this kind of thinking that always surprises me.
Graphain
If you're working with numbers "small enough" (say, 16-bit integers on a 32-bit architecture or 32-bit integers on a 64-bit architecture), you can simply shift the lowest (highest) n bits to the left, then combine the bits with OR (with n=8, an example would be turning 1 and 255 to either 511 (shift lowest) or 65281 (shift highest)).
Vatine
Another mathmetical possibility is computing powers in a Galois field, `(g**x)**y == (g**y)**x`.
caf
+1  A: 

Suppose you have any hash h(x,y). Then define f(x,y) = h(x,y) + h(y,x). Now you have a symmetric hash.

(If you do a trivial multiplicative "hash" then 1+3 and 2+2 might hash to the same value, but even something like h(x,y) = x*y*y will avoid that--just make sure there's some nonlinearity in at least one argument of the hash function.)

Rex Kerr