views:

193

answers:

2

I have a list which I shuffle with the Python built in shuffle function (random.shuffle)

However, the Python reference states:

Note that for even rather small len(x), the total number of permutations of x is larger than the period of most random number generators; this implies that most permutations of a long sequence can never be generated.

Now, I wonder what this "rather small len(x)" means. 100, 1000, 10000,...

Can anybody clarify?

Thanks!

A: 

What they mean is that permutations on n objects (noted n!) grows absurdly high very fast. Basically n! = n x n-1 x ... x 1; for example 5! = 5 x 4 x 3 x 2 x 1 = 120 which means there are 120 possible ways of shuffling a 5-items list.

On the same python page doc they give 2^19937-1 as period, which is 4.something × 10^6001 or something. Based on the wikipedia page on factorials, I guess 2000! should be around that. (sorry, I didn't find the exact figure)

So basically there are so many possible permutations the shuffle will take from that there's probably no real reason to worry about those it won't.

But if it really is an issue (pesky customer asking for a guarantee of randomness perhaps?), you could also offload the task to some third-party; see http://www.random.org/ for example.

Joubarc
Or 2081 as Johannes says. Guess I wasn't that far off then.
Joubarc
I was narrowing it down manually in Wolfram|Alpha since it wouldn't give me just a result for "x! > 2^19937-1".
Joey
I arrived at that with a quick loop testing for "math.factorial(i) >= 2**19937" :)
rbp
@rbp: I should really start giving my favorite scripting environment (PowerShell) some better math capabilities :-)
Joey
Or give it Python bindings, and use Python's stdlib! ;)
rbp
+16  A: 

TL;WR: It "breaks" on lists with over 2080 elements, but don't worry too much :)

Complete answer:

First of all, notice that "shuffling" a list can be understood (conceptually) as generating all possible permutations of the elements of the lists, and picking one of these permutations at random.

Then, you must remember that all self-contained computerised random number generators are actually "pseudo" random. That is, they are not actually random, but rely on a series of factors to try and generate a number that is hard to be guessed in advanced, or purposefully reproduced. Among these factors is usually the previous generated number. So, in practice, if you use a random generator continuously a certain number of times, you'll eventually start getting the same sequence all over again (this is the "period" that the documentation refers to).

Finally, the docstring on Lib/random.py (the random module) says that "The period [of the random number generator] is 2**19937-1."

So, given all that, if your list is such that there are 2**19937 or more permutations, some of these will never be obtained by shuffling the list. You'd (again, conceptually) generate all permutations of the list, then generate a random number x, and pick the xth permutation. Next time, you generate another random number y, and pick the yth permutation. And so on. But, since there are more permutations than you'll get random numbers (because, at most after 2**19937-1 generated numbers, you'll start getting the same ones again), you'll start picking the same permutations again.

So, you see, it's not exactly a matter of how long your list is (though that does enter into the equation). Also, 2**19937-1 is quite a long number. But, still, depending on your shuffling needs, you should bear all that in mind. On a simplistic case (and with a quick calculation), for a list without repeated elements, 2081 elements would yield 2081! permutations, which is more than 2**19937.

rbp
+1 for nicely explaining the topic and problem. Imho this should be the accepted answer. Oh, and I'd move the TD;DR to the top since most people getting scared by a body of text probably won't read that far down :-).
Joey
Thanks :) And good idea on TL;DR, I'll do it!
rbp
@Johannes: you needen't have deleted your answer :) Still, thanks!
rbp
@rdp: Well, it was kinda redundant now :-). You did a much better job at explaining it.
Joey