views:

232

answers:

7

How can you loop over a fixed range of integers (say 100000-999999) in a difficult-to-predict order?

  • assume the range may be large enough that it would be impractical to store every possible integer in an array or keep an array of every element you've found so far.
  • you must hit every number in the range once and only once, and be able to tell when you've finished, i.e. there are no numbers left

I.e. I'd like something more elegant than just A) pick a random number and then B) check whether it's been used already and if so go back to step A, for the following reasons (unless you can convince me otherwise):

  • that's really going to suck when you start running out of numbers
  • telling whether you've run out of unused numbers may be prohibitively expensive
  • this approach could also conceivably have concurrency issues if you have a lot of clients or threads trying to do it at the same time out of the same range
+4  A: 

You should look at using a linear feedback shift register. A "maximum length" LFSR will hit every number in a range (2^n -1) except 0. You could store the first number, so you'll know when it comes back around to the start, or you could just count the samples. The problem is that it is deterministic, so technically you can predict it if you know the algorithm. Also, given any number in the sequence, the sequence is always the same from that point on.

You could have a list of a bunch of LFSR algorithms and randomly choose one before you start, so that it's less predictable from run to run. But the method you use to choose the algorithm is probably predictable too...

If you need to do this with true randomness, then you need a hardware random number generator (an algorithmic approach is always predictable) and you will have to maintain a list of all the numbers in the range, then use the random number generator to pick a number out of the list, and remove it from the list at the same time so that you don't pick it again.

Scott Whitlock
+1  A: 

If the order does not have to be truly random (just unpredictable) and if we're allowed to store a small-ish range of values (say N), then:

  1. Store first N values in sublist
  2. Return random value from sublist until, say N/2, values remain
  3. Add N/2 more values from main list into sublist.
  4. Repeat 2-3 until all values done.

Step 2-3 is necessary or else every Nth return value would be predictable. Vary sublist size (N) and reload threshold (N/2) to find a good compromise between memory usage, reload frequency, and "randomness".

lsc
+3  A: 

If you need unobvious but not really secure, choose a step size that N for range M such that N and M are relatively prime and do the arithmatic mod M.

dmckee
How can I choose relative primes?
ʞɔıu
Easiest just to choose a bigish prime (so relatively prime against everything), but there are efficient methods...
dmckee
If you have access to a prime number generator, picking a large prime is fine. Otherwise, M-1 and M+1 are guaranteed to be relatively prime to M. Using either of these will produce blindingly obvious results, since mod M these are equivalent to -1 and 1. Another way would be to generate some list of the first n primes, test each of them for whether they divide M, then multiply the ones that don't together. Expensive and awkward, but only needed at initialization.
David Berger
+1  A: 

Use a linear congruential random number generator and discard any integers outside your range.

Peter Ruderman
A: 

Have a database generate a table with your numbers 1 through N.

Have the database generate a random number for each of the N values.

Return the N values sorted by the random number.

wcm
Looking back on this answer, I realize that I didn't answer the question that was being asked. I am leaving it because it could be used to solve this kind of problem in a rather straight forward manner.
wcm
+8  A: 

The linear congruential random number generators (covered in excruciating detail in volume 2 of Knuth) will step through every value in the given range without repeating, in a way that isn't easy to predict. The fundamental statement is

v = k * v + l mod m

where m is the size of the set, k and l are relatively prime to m (I believe that's enough to guarantee that it works as desired), and v is the value being used. Select the initial value in some more or less random way, and go from there.

One advantage is that it's fairly quick to write, assuming you can avoid overflow (either by restricting k and m, or by using arbitrary-precision arithmetic routines).

David Thornley
what does relatively prime to m mean? so, for 100 could I use 19 and 13? Also I assume the order of operations is ((k * v) + l) mod m), right?
ʞɔıu
Relatively prime means that there are no common factors. Prime numbers are relatively prime to any other number. 4 is relatively prime to 9, even though neither is prime, because 2 does not divide 9 and 3 does not divide 4. 5 is relatively prime to both.
David Berger
I tried this and found that it did not exhaust the range before repeating numbers, depending on the constant values that I chose. According to the wikipedia entry http://tinyurl.com/2qx2xb there are additional restrictions if you want the sequence to be exhaustive: k-1 must be divisible by all prime factors of m, and k-1 must be divisible by 4 if m is divisible by 4. So, for m=100 k could be 21, 41, 61 or 81. Seems pretty limiting. (even possible if m is itself prime?) The article also says that l must be constrained by m, which I tried violating and found that it seems to work anyway.
ʞɔıu
+3  A: 

If you need a cryptographically secure way to do this, you might want to check out my article on Cryptographically Secure Permutations with Block Ciphers. In a nutshell, pick a block cipher, use a technique called XOR folding to reduce it to the smallest power of 2 greater than the desired range, and then use the following technique to generate only numbers within the desired range:

def permute(index, max):
  index = E(index)
  if index > max:
    return permute(index, max)

That is, simply 're-encrypt' any number that you generate that's outside the desired range. The amount of work required to generate the entire sequence is bounded at the number of items in the source sequence. The worst case for generating a single element is 1 + unused_range, but that's a vanishingly slim possibility.

You can apply that to anything that generates a 1:1 mapping, of course, not just the block cipher example. And if you're dealing with a different sort of RNG - an LFSR, for example, instead of re-applying the function, just skip that element.

Nick Johnson