views:

141

answers:

3

Some stated that python dictionary is thread safe. Does it mean I can or cannot modified the items in a dictionary while iterating over it?

+6  A: 

The two concepts are completely different. Thread safety means that two threads cannot modify the same object at the same time, thereby leaving the system in an inconsistent state.

That said, you cannot modify a dictionary while iterating over it. See the documentation..

The dictionary p should not be mutated during iteration. It is safe (since Python 2.1) to modify the values of the keys as you iterate over the dictionary, but only so long as the set of keys does not change.

deinst
A: 

No. Recent version of python will raise an exception if you try to iterate over a dictionary that has changed size between iterations.

>>> d={'one':1, 'two':2}
>>> for x in d:
...    d['three']=3
...    print x
...
two
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

Notice that you don't need to use threads to see this

gnibbler
+5  A: 

The other answers already correctly addressed what's apparently your actual question:

Does it mean I can or cannot modified the items in a dictionary while iterating over it?

by explaining that thread safety has nothing to do with the issue, and in any case, no, you cannot modify a dict while iterating over it.

However, the title of your question is about thread safety, and you start with:

Some stated that python dictionary is thread safe

I don't know who the "some" are, but, if they did state that (rather than you misunderstanding what they did state;-) without heavy qualifications, they're wrong.

Some operations, those which don't alter the set of keys in the dict, happen to be thread-safe in current CPython implementations -- but you should not count on that, unless you strictly control the Python version under which your code will run, because such thread safety is not guaranteed by Python's language specification and therefore other implementations, including future versions of CPython, might not offer it.

If every thread is only "reading" the dict (indexing it, looping on it, etc), and no thread performs any assignment or deletion on it, then that situation is safe in current CPython implementations; in fact, if some thread assigns a new value to a key that was already present, that is also thread safe (other threads may see the previous value for that key, or the next one, depending on how the threads happen to be timed, but there will be no crash, no deadlock, and no appearance of crazy values out of nowhere, in current CPython implementations).

However, an operation such as d[k] += 1 (assuming k was previously present, and its value a number) is not properly speaking thread safe (any more than other case of +=!) because it can be seen as d[k] = d[k] + 1 -- it might happen that two threads in a race condition both read the old value of d[k], then increment it by one, and store the same new value in the slot... so the overall effect is to increment it only by one, and not by two as would normally occur.

Back to your other question... "only reading" the dict, and assigning new values to keys that already existed in the dict, are also the things you can do in the body of a loop that iterates on the dict -- you can't alter the set of keys in the dict (you can't add any key, nor remove any key), but the specific operation of setting a new value for an existing key is allowed. The allowed operations in this case do include the += that would be problematic in a threading situation. For example:

>>> d = dict.fromkeys(range(5), 0)
>>> for k in d: d[k] += 1
... 
>>> d
{0: 1, 1: 1, 2: 1, 3: 1, 4: 1}

and this behavior is guaranteed by Python's standardized semantics, so different implementations of the language should all preserve it.

Alex Martelli