views:

149

answers:

2

I am implementing a relatively simple thread pool with Python's Queue.Queue class. I have one producer class that contains the Queue instance along with some convenience methods, along with a consumer class that subclasses threading.Thread. I instantiate that object for every thread I want in my pool ("worker threads," I think they're called) based on an integer.

Each worker thread takes flag, data off the queue, processes it using its own database connection, and places the GUID of the row onto a list so that the producer class knows when a job is done.

While I'm aware that other modules implement the functionality I'm coding, the reason I'm coding this is to gain a better understanding of how Python threading works. This brings me to my question.

If I store anything in a function's namespace or in the class's __dict__ object, will it be thread safe?

class Consumer(threading.Thread):
    def __init__(self, producer, db_filename):
        self.producer = producer
        self.conn = sqlite3.connect(db_filename)  # Is this var thread safe?
    def run(self):
        flag, data = self.producer.queue.get()

        while flag != 'stop':
            # Do stuff with data; Is `data` thread safe?

I am thinking that both would be thread safe, here's my rationale:

  • Each time a class is instantiated, a new __dict__ gets created. Under the scenario I outline above, I don't think any other object would have a reference to this object. (Now, perhaps the situation might get more complicated if I used join() functionality, but I'm not...)
  • Each time a function gets called, it creates its own name space which exists for the lifetime of the function. I'm not making any of my variables global, so I don't understand how any other object would have a reference to a function variable.

This post addresses my question somewhat, but is still a little abstract for me.

Thanks in advance for clearing this up for me.

+2  A: 

You are right; this is thread-safe. Local variables (the ones you call "function namespace") are always thread-safe, since only the thread executing the function can access them. Instance attributes are thread-safe as long as the instance is not shared across threads. As the consumer class inherits from Thread, its instances certainly won't be shared across threads.

The only "risk" here is the value of the data object: in theory, the producer might hold onto the data object after putting it into the queue, and (if the data object itself is mutable - make sure you understand what "mutable" means) may change the object while the Consumer is using it. If the producer leaves the data object alone after putting it into the queue, this is thread-safe.

Martin v. Löwis
A: 

I think you are on the whole correct with your assumptions and in your case you are most probably correct.

However it slightly more difficult to tell if something is thread safe or not then you say.

Calls such as self.conn = sqlite3.connect(db_filename) may not be, as sqlite3 module could be sharing some state and calling the function may have some side effects. However, I doubt that this is the case and like you I would assume it was producing a totally new variable.

It is not just global variables that could be a problem, getting mutable variables from outer scopes is also an issue.

So the data in

flag, data = self.producer.queue.get()

may or may not be thread safe, depending on where the data was produced originally. However, I assume that this data would consist of independent (preferably immutable) information. So if thats the case then all should be thread safe.

David Raznick