views:

177

answers:

4

Hello people.

I'd like to know if the absence of element ordering of the Python's built-in set structure is "random enough". For instance, taking the iterator of a set, can it be considered a shuffled view of its elements?

(If it matters, I'm running Python 2.6.5 on a Windows host.)

+17  A: 

No, it is not random. It is "arbitrarily ordered", which means that you cannot depend on it being either ordered or random.

Ignacio Vazquez-Abrams
It's important to understand the difference between "undefined" and "random".
Matti Virkkunen
Indeed, the order is predictable from the ID's of the various objects in the set. It's quite rigorously defined by the code. BUT -- bonus -- the details are none of your business, making them "arbitrary" and "implementation-specific" and "undependable for anything". And "undefined as far as you're allowed to care."
S.Lott
OK. The hash function will determine the order. For instance, for integer elements we'll get the natural order. So, I conclude that we'll have an "undefined", "arbitrary" and "repeatable" ordering for the same set of elements.
Chuim
It might only be repeatable under a single implementation of Python. If the spec says it's undefined, don't assume anything else about it (not even repeatability).
Matti Virkkunen
@Matti: got it!
Chuim
Undefined means "changeable without notice". So an upgrade from 2.6.1 to 2.6.2 i allowed to change things that are otherwise undefined.
S.Lott
+4  A: 

No, you can not rely on it for any real statistical purpose. The implementation of sets in Python is in terms of a hash table, and can cause the element distribution to display some very non-random properties. There's a large gap between "not having a guaranteed order" and "guaranteed to be unordered in a uniform-random manner".

Use random.shuffle to really shuffle elements of a sequence.

Eli Bendersky
The thing is `random.shuffle` may only be used for sequences, which a `set` is not. One may convert it to a `list` but for a big number of elements and performance sensitive code it may be an issue...
Chuim
+2  A: 

In a word, no:

>>> list(set(range(10000))) == list(range(10000))
True
Daniel Stutzbach
Agreed, just did similar tests myself and got these results.
Chuim
A: 

Arbitrariness is central when designing programs, each of these freedoms that you reserve is like a joker card that you can use when you implement, develop, or rewrite your program. The more of these free-cards you collect, the more efficiency can you deliver from your code (probably), since you have more freedom to change it.

It is not random, it's only freedom. If it's a better set that way, the order can be forwards on Wednesdays and "backwards" on Fridays.

kaizer.se
Top "Zen" answer! ;)
Chuim