ansaurus

Question

Answer 1

+2 A:

You could MD5 hash a comma separated string containg your ints.

In C# it would look something like this (Disclaimer: I have no compiler on the machine I'm using today):

using System;
using System.Security.Cryptography;
using System.Text;

public class SomeClass {
    static Guid GetHash(int[] numbers) {
        string csv = string.Join(',', numbers);
        return new Guid(new MD5CryptoServiceProvider().ComputeHash(Encoding.ASCII.GetBytes(csv.Trim())));
    }
}

Edit: What was I thinking? As stated by others, you don't need a hash. The CSV should be sufficient as a string Id (unless your numbers array is big).

grenade 2009-08-30 09:46:51

The hash is a good idea because he wants an integer, not just a string, to represent uniqueness. But, as stated in my reply, you can just take the first 4 bytes of any hash to get this :)

Noon Silk 2009-08-30 09:57:25

Indeed !

grenade 2009-08-30 09:58:11

Answer 2

A:

Convert each number to String, concatenate Strings (via StringBuffer) and take contents of StringBuffer as a key.

Victor Sorokin 2009-08-30 09:46:52

You'll need a delimiter, of course or 12,3 would be the same as 1,23

Kobi 2009-08-30 09:49:05

You right, overlooked this.

Victor Sorokin 2009-08-30 09:57:24

Hi Victor, This suggestion was mentioned in my original post. It will work but the problem is that the keys quickly become very large. Consider the case for n = 20. If my permuation is {1, 2, 3 ... 18, 19, 20} the corresponding id is: 01234567891011121314151617181920Some of the bigger problems I hope to work with have n = 400. Ideally, I'd prefer a more memory efficient solution.

Daniel 2009-08-30 10:07:21

n=400? And you want to map permutations _uniquely_ to integers? 400! is a very large number... around 870 decimal digits. You might have to forget about uniqueness and go for a hash-based solution as suggested above.

ozan 2009-08-30 10:49:27

Answer 3

+4 A:

Zobrist hashing might work for you. You need to create an NxN matrix of random integers, each cell representing that element i is in the jth position in the current permutation. For a given permutation you pick the N cell values, and xor them one by one to get the permutation's key (note that key uniqueness is not guaranteed).

The point in this algorithm is, that if you swap to elements in your permutations, you can easily generate the new key from the current permutation by simply xor-ing out the old and xor-ing in the new positions.

Zed 2009-08-30 09:48:10

Thanks for your suggestion Zed. Zobrist hashing looks like a pretty easy way to go. Looking at http://chessprogramming.wikispaces.com/Zobrist+Hashing it seems that the selection of good random seed is rather important to minimise collisions.

Daniel 2009-08-30 13:36:59

Answer 4

+3 A:

How fast does it need to be?

You could always gather the integers as a string, then take the hash of that, and then just grab the first 4 bytes.

For a hash you could use any function really, like MD5 or SHA-256.

Noon Silk 2009-08-30 09:48:42

Hi silky. Thanks for posting. The hash needs to be reasonably quick as there are many permutations to be considered. I'm not familiar with the time complexity of MD5 or SHA-256 though I do like the suggestion to grab the first 4 bytes of the result.

Daniel 2009-08-30 13:40:59

I'd attempt both then, and compare times.

Noon Silk 2009-08-30 21:59:19

Answer 5

+3 A:

As others have suggested, you can use hashing to generate an integer that will be unique with high probability. However, if you need the integer to always be unique, you should rank the permutations, i.e. assign an order to them. For example, a common order of permutations for set {1,2,3} is the lexicographical order:

1,2,3
1,3,2
2,1,3
2,3,1
3,1,2
3,2,1

In this case, the id of a permutation is its index in the lexicographical order. There are other methods of ranking permutations, of course.

Making ids a range of continuous integers makes it possible to implement the storage of processed permutations as a bit field or a boolean array.

Bojan Resnik 2009-08-30 10:06:35

This approach will provide you with the smallest possible keysize that *guarantees* uniqueness. Google "rank" "permutations" to find some efficient ways of ranking arbitrary permutations. (Well, more efficient than enumerating every lower permutation... :) )

j_random_hacker 2009-08-30 10:15:16

I like this, but it does require saving a lot more data than the naive hashing approach.

Noon Silk 2009-08-30 10:36:29

This would be a good solution if n were, say, around 6 or 7. When you're mapping the permutations of 20 to the first 20! integers, it's getting pretty silly. Daniel is talking about n=400. Uniqueness is not realistic.

ozan 2009-08-30 11:05:46

When you are using hashes, there are two possibilities: 1. the hashing is unique - in that case you are again mapping to n! integers, and 2. the hashing is not unique, in which case you need to store each processed permutation as well in order to make sure that you processed it. For a large n, however, there is nothing that could be memory-efficient so one will certainly have to resort to using external storage.

Bojan Resnik 2009-08-30 12:33:07

@Daniel: Could you update your question to mention you have n=400 or so? ozan is correct to say that this approach is unsuitable for n>10 or so.

j_random_hacker 2009-08-30 13:53:08

Thank you for the suggestion Bojan. I've been meaning to look at ranking algorithms for some time now and this reminded me. As you and ozan say, it doesn't look like it will work for large instances as the rank will quickly overflow the integer type I'm using for keys. Nevertheless, a very nice idea! EDIT: reposted; fixed a minor typo.

Daniel 2009-08-30 14:01:40

@j_random_hacker: All fixed; along with a bunch of other clarifications :)

Daniel 2009-08-30 14:18:16

Answer 6

A:

Not relates directly to the question, but as an alternative solution you may use Trie tree as a look up structure. Trie trees are very good for strings operations, its implementation relatively easy and it should be more faster (max of n(k) where k is length of a key) than hashset for a big amount of long strings. And you aren't limited in key size( such in a regular hashset in must int, not bigger). Key in your case will be a string of all numbers separated by some char.

Kamarey 2009-08-30 10:24:33

Answer 7

A:

Random thought: Couldn't one use the fact, that every permutation can be written as product of transpositions, to derive some kind of signature for an actual permutation? Sorry. No algorithm right now, have to think about it first.

Dirk 2009-08-30 10:34:19

Answer 8

+5 A:

Judging by your question, and the comments you've left, I'd say your problem is not possible to solve.

Let me explain.

You say that you need a unique hash from your combination, so let's make that rule #1:

#1: Need a unique number to represent a combination of an arbitrary number of digits/numbers

Ok, then in a comment you've said that since you're using quite a few numbers, storing them as a string or whatnot as a key to the hashtable is not feasible, due to memory constraints. So let's rewrite that into another rule:

#2: Cannot use the actual data that were used to produce the hash as they are no longer in memory

Basically, you're trying to take a large number, and store that into a much smaller number range, and still have uniqueness.

Sorry, but you can't do that.

Typical hashing algorithms produce relatively unique hash values, so unless you're willing to accept collisions, in the sense that a new combination might be flagged as "already seen" even though it hasn't, then you're out of luck.

If you were to try a bit-field, where each combination has a bit, which is 0 if it hasn't been seen, you still need large amounts of memory.

For the permutation in n=20 that you left in a comment, you have 20! (2,432,902,008,176,640,000) combinations, which if you tried to simply store each combination as a 1-bit in a bit-field, would require 276,589TB of storage.

You're going to have to limit your scope of what you're trying to do.

Lasse V. Karlsen 2009-08-30 10:44:46

Hi Lasse. Thank you for your comment.I think my post should have been clearer; I've been working with small instances all day (n=5) and in those cases I'd like uniqueness. For larger instances, minimising collisions is sufficient.I'm not trying to exhaustively enumerate all permutations but rather run a directed local search.

Daniel 2009-08-30 13:45:22

Answer 9

A:

Prime powers would work: if p_i is the i^th prime and a_i is the i^th element of your tuple, then

p_0**a_0 * p_1**a_1 * ... * p_n**a_n

should be unique by the Fundamental Theorem of Arithmetic. Those numbers will get pretty big, though :-)

(e.g. for n=5, (1,2,3,4,5) will map to 870,037,764,750 which is already more than 32 bits)

John Fouhy 2009-08-30 23:15:44

Answer 10

A:

Similar to Bojan's post it seems like the best way to go is to have a deterministic order to the permutations. If you process them in that order then there is no need to do a lookup to see if you have already done any particular permutation.

Dolphin 2009-08-31 15:42:15

ansaurus

tags:

views:

answers:

Generating ids for a set of integers

related questions