Quick background: we have a large source base written in Python. It is a compiler for a domain specific language, and internally everything is represented as directed graphs. These digraphs are built up from sets, and so we use the builtin set type in Python.
The problem is that we didn't originally realise that Python actively uses the lack of ordering guarantee in a set object to use a faster non-deterministic implementation. So when you iterate over the objects in a set (as we do frequently) the order they are returned in is weakly random. It doesn't change on every execution, but it does change frequently. From what I've seen debugging our code it seems as if a hash of the source code acts as a seed for the random number generator. So changing the code, even in a path that is not executed, causes the set iterator to change the order that elements are generated in.
Our basic debugging strategy is to dump prints into where we think the error is, refining them based on the output until we find the bug. Not very elegant but for most things it largely works. The basic problems is that adding/changing any print statement triggers different behaviour and a wildly different execution path. We can't afford to print/log everything as it slows down the compiler's execution time from about 20s (managable) to about an hour (not so managable).
How would you diagnose a problem that only occurs infrequently and disappears when you start to look for it?
Edit for clarification: Several answers suggest ways to fix the ordering of the sets. As torkildr says below "The problem here isn't that sets behave strangely, the problem is that your program behaves as if it doesn't". This is exactly the problem, but the solution is not to use deterministic sets. This would simply mask the behaviour. The problem is to find why our program behaves this way and fix that behaviour. The algorithms that we use should work on graphs represented as unordered sets. They don't. I need to find out where these bugs and why they occur.
Problem solved:
It turns out that on the implementation of Python that I'm using (2.6 on OS-X) if the relationship between the __eq__
and __hash__
methods of the objects being stored in the set is not quite a valid ordering then the system exhibits the weakly random behaviour described. There must be some code in the C implementation of set.add() that uses something from the random module to build the representation. This causes a dependency on the system entropy pool which changes the ordering on disk writes.
No direct answers, but reading kriss' follow-up question caused the insight to solve this problem so he gets the vote.