views:

166

answers:

2

This is probably a rudimentary question, but I'm new to threaded programming in Python and am not entirely sure what the correct practice is.

Should I be creating a single lock object (either globally or being passed around) and using that everywhere that I need to do locking? Or, should I be creating multiple lock instances in each of the classes where I will be employing them. Take these 2 rudimentary code samples, which direction is best to go? The main difference being that a single lock instance is used in both class A and B in the second, while multiple instances are used in the first.

Sample 1

class A():
    def __init__(self, theList):
        self.theList = theList
        self.lock = threading.Lock()

    def poll(self):
        while True:
            # do some stuff that eventually needs to work with theList
            self.lock.acquire()
            try:
                self.theList.append(something)
            finally:
                self.lock.release()





class B(threading.Thread):
    def __init__(self,theList):
        self.theList = theList
        self.lock = threading.Lock()
        self.start()


    def run(self):
        while True:
            # do some stuff that eventually needs to work with theList
            self.lock.acquire()
            try:
                self.theList.remove(something)
            finally:
                self.lock.release()



if __name__ == "__main__":
    aList = []
    for x in range(10):
        B(aList)

    A(aList).poll()

Sample 2

class A():
    def __init__(self, theList,lock):
        self.theList = theList
        self.lock = lock

    def poll(self):
        while True:
            # do some stuff that eventually needs to work with theList
            self.lock.acquire()
            try:
                self.theList.append(something)
            finally:
                self.lock.release()



class B(threading.Thread):
    def __init__(self,theList,lock):
        self.theList = theList
        self.lock = lock
        self.start()


    def run(self):
        while True:
            # do some stuff that eventually needs to work with theList
            self.lock.acquire()
            try:
                self.theList.remove(something)
            finally:
                self.lock.release()



if __name__ == "__main__":
    lock = threading.Lock()
    aList = []
    for x in range(10):
        B(aList,lock)

    A(aList,lock).poll()
+4  A: 

In the general case, a single global lock is less efficient (more contention) but safer (no risk of deadlock) as long as it's a RLock (reentrant) rather than a plain Lock.

The potential problems come when a thread that's executing while holding a lock tries to acquire another (or the same) lock, for example by calling another method that contains the acquire call. If a thread that's already holding a lock tries to acquire it again, it will block forever if the lock's a plain Lock, but proceed smoothly if it's a slightly more complex RLock -- that's why the latter is called reentrant, because the thread holding it can "enter" (acquire the lock) again. Essentially, a RLock keeps track of which thread holds it, and how many time the thread has acquired the lock, while the simpler Lock does not keep such information around.

With multiple locks, the deadlock problem comes when one thread tries to acquire lock A then lock B, while another tries to acquire first lock B, then lock A. If that occurs, then sooner or later you'll be in a situation where the first lock holds A, the second one holds B, and each tries to acquire the lock that the other one is holding -- so both block forever.

One way to prevent multiple-lock deadlocks is to make sure that locks are always acquired in the same order, whatever thread is doing the acquiring. However, when each instance has its own lock, that's exceedingly difficult to organize with any clarity and simplicity.

Alex Martelli
offtopic Congrats for your 100k !!! ..
OscarRyz
@Oscar, thanks!-)
Alex Martelli
+4  A: 

If you use a separate lock object in each class then you run a risk of deadlocking, e.g. if one operation claims the lock for A and then claims the lock for B while a different operation claims B and then A.

If you use a single lock then you're forcing code to single thread when different operations could be run in parallel. That isn't always as serious in Python (which has a global lock in any case) as in other languages, but say you were to hold a global lock while writing to a file Python would release the GIL but you'd have blocked everything else.

So it's a tradeoff. I'd say go for little locks as that way you maximise the chance for parallel execution, but take care never to claim more than one lock at a time, and try not to hold onto a lock for any longer than you absolutely have to.

So far as your specific examples go, the first one is just plain broken. If you lock operations on theList then you must use the same lock every time or you aren't locking anything. That may not matter here as list.append and list.remove are effectively atomic anyway, but if you do need to lock access to the list you need to be sure to use the same lock every time. The best way to do that is to hold the list and a lock as attributes of a class and force all access to the list to go through methods of the containing class. Then pass the container class around not the list or the lock.

Duncan
Thanks! That's exactly the help I was looking for!
Matty