views:

280

answers:

10

Background:

I'm working with permutations of the sequence of integers {0, 1, 2 ... , n}. I have a local search algorithm that transforms a permutation in some systematic way into another permutation. The point of the algorithm is to produce a permutation that minimises a cost function. I'd like to work with a wide range of problems, from n=5 to n=400.

The problem:

To reduce search effort I need to be able to check if I've processed a particular permutation of integers before. I'm using a hash table for this and I need to be able to generate an id for each permutation which I can use as a key into the table. However, I can't think of any nice hash function that maps a set of integers into a key such that collisions do not occur too frequently.

Stuff I've tried:

I started out by generating a sequence of n prime numbers and multiplying the ith number in my permutation with the ith prime then summing the results. The resulting key however produces collisions even for n=5.

I also thought to concatenate the values of all numbers together and take the integer value of the resulting string as a key but the id quickly becomes too big even for small values of n. Ideally, I'd like to be able to store each key as an integer.

Does stackoverflow have any suggestions for me?

+2  A: 

You could MD5 hash a comma separated string containg your ints.

In C# it would look something like this (Disclaimer: I have no compiler on the machine I'm using today):

using System;
using System.Security.Cryptography;
using System.Text;

public class SomeClass {
    static Guid GetHash(int[] numbers) {
        string csv = string.Join(',', numbers);
        return new Guid(new MD5CryptoServiceProvider().ComputeHash(Encoding.ASCII.GetBytes(csv.Trim())));
    }
}

Edit: What was I thinking? As stated by others, you don't need a hash. The CSV should be sufficient as a string Id (unless your numbers array is big).

grenade
The hash is a good idea because he wants an integer, not just a string, to represent uniqueness. But, as stated in my reply, you can just take the first 4 bytes of any hash to get this :)
Noon Silk
Indeed !
grenade
A: 

Convert each number to String, concatenate Strings (via StringBuffer) and take contents of StringBuffer as a key.

Victor Sorokin
You'll need a delimiter, of course or 12,3 would be the same as 1,23
Kobi
You right, overlooked this.
Victor Sorokin
Hi Victor, This suggestion was mentioned in my original post. It will work but the problem is that the keys quickly become very large. Consider the case for n = 20. If my permuation is {1, 2, 3 ... 18, 19, 20} the corresponding id is: 01234567891011121314151617181920Some of the bigger problems I hope to work with have n = 400. Ideally, I'd prefer a more memory efficient solution.
Daniel
n=400? And you want to map permutations _uniquely_ to integers? 400! is a very large number... around 870 decimal digits. You might have to forget about uniqueness and go for a hash-based solution as suggested above.
ozan
+4  A: 

Zobrist hashing might work for you. You need to create an NxN matrix of random integers, each cell representing that element i is in the jth position in the current permutation. For a given permutation you pick the N cell values, and xor them one by one to get the permutation's key (note that key uniqueness is not guaranteed).

The point in this algorithm is, that if you swap to elements in your permutations, you can easily generate the new key from the current permutation by simply xor-ing out the old and xor-ing in the new positions.

Zed
Thanks for your suggestion Zed. Zobrist hashing looks like a pretty easy way to go. Looking at http://chessprogramming.wikispaces.com/Zobrist+Hashing it seems that the selection of good random seed is rather important to minimise collisions.
Daniel
+3  A: 

How fast does it need to be?

You could always gather the integers as a string, then take the hash of that, and then just grab the first 4 bytes.

For a hash you could use any function really, like MD5 or SHA-256.

Noon Silk
Hi silky. Thanks for posting. The hash needs to be reasonably quick as there are many permutations to be considered. I'm not familiar with the time complexity of MD5 or SHA-256 though I do like the suggestion to grab the first 4 bytes of the result.
Daniel
I'd attempt both then, and compare times.
Noon Silk
+3  A: 

As others have suggested, you can use hashing to generate an integer that will be unique with high probability. However, if you need the integer to always be unique, you should rank the permutations, i.e. assign an order to them. For example, a common order of permutations for set {1,2,3} is the lexicographical order:

  1. 1,2,3
  2. 1,3,2
  3. 2,1,3
  4. 2,3,1
  5. 3,1,2
  6. 3,2,1

In this case, the id of a permutation is its index in the lexicographical order. There are other methods of ranking permutations, of course.

Making ids a range of continuous integers makes it possible to implement the storage of processed permutations as a bit field or a boolean array.

Bojan Resnik
This approach will provide you with the smallest possible keysize that *guarantees* uniqueness. Google "rank" "permutations" to find some efficient ways of ranking arbitrary permutations. (Well, more efficient than enumerating every lower permutation... :) )
j_random_hacker
I like this, but it does require saving a lot more data than the naive hashing approach.
Noon Silk
This would be a good solution if n were, say, around 6 or 7. When you're mapping the permutations of 20 to the first 20! integers, it's getting pretty silly. Daniel is talking about n=400. Uniqueness is not realistic.
ozan
When you are using hashes, there are two possibilities: 1. the hashing is unique - in that case you are again mapping to n! integers, and 2. the hashing is not unique, in which case you need to store each processed permutation as well in order to make sure that you processed it. For a large n, however, there is nothing that could be memory-efficient so one will certainly have to resort to using external storage.
Bojan Resnik
@Daniel: Could you update your question to mention you have n=400 or so? ozan is correct to say that this approach is unsuitable for n>10 or so.
j_random_hacker
Thank you for the suggestion Bojan. I've been meaning to look at ranking algorithms for some time now and this reminded me. As you and ozan say, it doesn't look like it will work for large instances as the rank will quickly overflow the integer type I'm using for keys. Nevertheless, a very nice idea! EDIT: reposted; fixed a minor typo.
Daniel
@j_random_hacker: All fixed; along with a bunch of other clarifications :)
Daniel
A: 

Not relates directly to the question, but as an alternative solution you may use Trie tree as a look up structure. Trie trees are very good for strings operations, its implementation relatively easy and it should be more faster (max of n(k) where k is length of a key) than hashset for a big amount of long strings. And you aren't limited in key size( such in a regular hashset in must int, not bigger). Key in your case will be a string of all numbers separated by some char.

Kamarey
A: 

Random thought: Couldn't one use the fact, that every permutation can be written as product of transpositions, to derive some kind of signature for an actual permutation? Sorry. No algorithm right now, have to think about it first.

Dirk
+5  A: 

Judging by your question, and the comments you've left, I'd say your problem is not possible to solve.

Let me explain.

You say that you need a unique hash from your combination, so let's make that rule #1:

  • #1: Need a unique number to represent a combination of an arbitrary number of digits/numbers

Ok, then in a comment you've said that since you're using quite a few numbers, storing them as a string or whatnot as a key to the hashtable is not feasible, due to memory constraints. So let's rewrite that into another rule:

  • #2: Cannot use the actual data that were used to produce the hash as they are no longer in memory

Basically, you're trying to take a large number, and store that into a much smaller number range, and still have uniqueness.

Sorry, but you can't do that.

Typical hashing algorithms produce relatively unique hash values, so unless you're willing to accept collisions, in the sense that a new combination might be flagged as "already seen" even though it hasn't, then you're out of luck.

If you were to try a bit-field, where each combination has a bit, which is 0 if it hasn't been seen, you still need large amounts of memory.

For the permutation in n=20 that you left in a comment, you have 20! (2,432,902,008,176,640,000) combinations, which if you tried to simply store each combination as a 1-bit in a bit-field, would require 276,589TB of storage.

You're going to have to limit your scope of what you're trying to do.

Lasse V. Karlsen
Hi Lasse. Thank you for your comment.I think my post should have been clearer; I've been working with small instances all day (n=5) and in those cases I'd like uniqueness. For larger instances, minimising collisions is sufficient.I'm not trying to exhaustively enumerate all permutations but rather run a directed local search.
Daniel
A: 

Prime powers would work: if p_i is the ith prime and a_i is the ith element of your tuple, then

p_0**a_0 * p_1**a_1 * ... * p_n**a_n

should be unique by the Fundamental Theorem of Arithmetic. Those numbers will get pretty big, though :-)

(e.g. for n=5, (1,2,3,4,5) will map to 870,037,764,750 which is already more than 32 bits)

John Fouhy
A: 

Similar to Bojan's post it seems like the best way to go is to have a deterministic order to the permutations. If you process them in that order then there is no need to do a lookup to see if you have already done any particular permutation.

Dolphin