views:

372

answers:

5

I am looking for constant time algorithm can change an ordered integer index value into a random hash index. It would nice if it is reversible. I need that hash key is unique for each index. I know that this could be done with a table look up in a large file. I.E. create an ordered set of all ints and then shuffle them randomly and write to a file in random sequence. You could then read them back as you need them. But this would require a seek into a large file. I wonder if there is a simple way to use say a pseudo random generator to create the sequence as needed?

Generating shuffled range using a PRNG rather than shuffling the answer by erikkallen of Linear Feedback Shift Registers looks like the right sort of thing. I just tried it but it produces repeats and holes.

Regards David Allan Finch

+1  A: 

Assuming your goal is to spread out grouped values across the whole range,
it seems like shuffling the bits in some pre-defined order might do the trick.
i.e. given 8 bits ABCDEFGH, arrange them like EGDBHCFA, or some such pattern.

The code would just be a simple sequence of masks, shifts and adds.

AShelly
Yes that is the sort of thing I was thinking of but I was hoping that there might be something more random.
David Allan Finch
A: 

Mmm... depending if you have a lot of numbers, you could use a normal stl list, and order it by a "random" criteria

bool
nonsort(int i, int j)
{
    return  random() & 31 >16 ? true : false;
}

std::list<int> li;
// insert elements
li.sort(nonsort);

Then, you can get all the integers with a normal iterator. Remember to initialize random with srand() with time or any other pseudo-random value.

Diego Sevilla
I did not know you could do that with sort and for smallish values I think this would be a good solution. But I was thinking about a size of a unsigned long long for the full range.
David Allan Finch
+3  A: 

You could try building a suitable Feistel network. These are normally used for cryptography (e.g. DES), but with at least 64 bits, so you may need to build one yourself that suits your needs. They are invertible by construction.

starblue
I think this is the correct answer but I need some time to figure out how to implement a Feistel Network.
David Allan Finch
+3  A: 

The question is now if you need a really random mapping, or just a "weak" permutation. Assuming the latter, if you operate with unsigned 32-bit integers (say) on 2's complement arithmetics, multiplication by any odd number is a bijective and reversible mapping. Of course the same goes for XOR, so a simple pattern which you might try to use is e.g.

unsigned int hash(int x) {
   return (((x ^ 0xf7f7f7f7) * 0x8364abf7) ^ 0xf00bf00b) * 0xf81bc437;
}

There is nothing magical in the numbers. So you can change them, and they can be even randomized. The only thing is that the multiplicands must be odd. And you must be calculating with rollaround (ignoring overflows). This can be inverted. To do the inversion, you need to be able to calculate the correct complementary multiplicands A and B, after which the inversion is

unsigned int rhash(int h) {
    return (((x * B) ^ 0xf00bf00b) * A) ^ 0xf7f7f7f7;
}

You can calculate A and B mathematically, but the easier thing for you is just to run a loop and search for them (once offline, that is).

The equation uses XORs mixed with multiplications to make the mapping nonlinear.

antti.huima
Interesting are A and B not inverses A=1/0x8364abf7 or are there rounding problems.
David Allan Finch
No no, they are not inverses in that sense. They are inverses in the finite multiplication group modulo 2**32. It has nothing to do with inverses in the field of rational numbers.
antti.huima
I had a quick run for the first 100,000 numbers and the results looks good and very fast. I hope to have more time tomorrow to test it with a larger set of numbers.
David Allan Finch
Yes, it is good and fast :) The only thing is that lowest-order bits are pretty linear... but that won't harm most likely in your application. If you want to break that, you can add ((x<<13)|(x>>19)) to the equation after first multiplication. The reverse is obvious.
antti.huima
A: 

For the set of constraints there really is no solution. An attempt to hash 32 bit unsigned, into a 32 bit unsigned, will give you collisions, unless you do a something simple, like a 1 to 1 mapping. Every number is its own hash.

EvilTeach