views:

275

answers:

3

Is accessing/changing dictionary values thread-safe?

I have a global dictionary foo and multiple threads with ids id1, id2, ... , idn. Is it OK to access and change foo's values without allocating a lock for it if it's known that each thread will only work with its id-related value, say thread with id1 will only work with foo[id1]?

+3  A: 

The GIL takes care of that, if you happen to be using CPython.

global interpreter lock

The lock used by Python threads to assure that only one thread executes in the CPython virtual machine at a time. This simplifies the CPython implementation by assuring that no two processes can access the same memory at the same time. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines. Efforts have been made in the past to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity), but so far none have been successful because performance suffered in the common single-processor case.

See are-locks-unnecessary-in-multi-threaded-python-code-because-of-the-gil.

gimel
That only concerns CPython though.
Bastien Léonard
Unless he happens to be using Jython or IronPython.
voyager
@Bastien Léonard: Beat me to it :)
voyager
+5  A: 

Assuming CPython: Yes and no. It is actually safe to fetch/store values from a shared dictionary in the sense that multiple concurrent read/write requests won't corrupt the dictionary. This is due to the global interpreter lock ("GIL") maintained by the implementation. That is:

Thread A running:

a = global_dict["foo"]

Thread B running:

global_dict["bar"] = "hello"

Thread C running:

global_dict["baz"] = "world"

won't corrupt the dictionary, even if all three access attempts happen at the "same" time. The interpreter will serialize them in some undefined way.

However, the results of the following sequence is undefined:

Thread A:

if "foo" not in global_dict:
   global_dict["foo"] = 1

Thread B:

global_dict["foo"] = 2

as the test/set in thread A is not atomic ("time-of-check/time-of-use" race condition). So, it is generally best, if you lock things:

lock = RLock()

def thread_A():
    lock.acquire()
    try:
        if "foo" not in global_dict:
            global_dict["foo"] = 1
    finally:
        lock.release()

def thread_B():
    lock.acquire()
    try:
        global_dict["foo"] = 2
    finally:
        lock.release()
Dirk
+5  A: 

The best, safest, portable way to have each thread work with independent data is:

import threading
tloc = threading.local()

Now each thread works with a totally independent tloc object even though it's a global name. The thread can get and set attributes on tloc, use tloc.__dict__ if it specifically needs a dictionary, etc.

Thread-local storage for a thread goes away at end of thread; to have threads record their final results, have them put their results, before they terminate, into a common instance of Queue.Queue (which is intrinsically thread-safe). Similarly, initial values for data a thread is to work on could be arguments passed when the thread is started, or be taken from a Queue.

Other half-baked approaches, such as hoping that operations that look atomic are indeed atomic, may happen to work for specific cases in a given version and release of Python, but could easily get broken by upgrades or ports. There's no real reason to risk such issues when a proper, clean, safe architecture is so easy to arrange, portable, handy, and fast.

Alex Martelli