views:

53

answers:

2

I need to use a thread pool in python, and I want to be able to know when at least 1 thead out or "maximum threads allowed" has finished, so I can start it again if I still need to do something.

I has been using something like this:

def doSomethingWith(dataforthread):
    dostuff()
    i = i-1 #thread has finished

i = 0
poolSize = 5
threads = []
data = #array of data
while len(data):
    while True:
        if i<poolSize: #if started threads is < poolSize start new thread
            dataforthread = data.pop(0)
            i = i+1
            thread = doSomethingWith(dataforthread)
            thread.start()
            threads.append(thread)
        else:
            break
    for t in threads: #wait for ALL threads (I ONLY WANT TO WAIT FOR 1 [any])
        t.join()

As I understand, my code opens 5 threads, and then waits for all the threads to finish before starting new threads, until data is consumed. But what I really want to do is start a new thread as soon as one of the threads finish and the pool has an "available spot" for a new thread.

I have been reading this, but I think that would have the same issue than my code (not sure, im new to python but by looking at joinAll() it looks like that).

Does someone has an example to do what I am trying to achieve?

I mean detecting as soon as i is < than poolSize, launching new threads until i=poolSize and do that until data is consumed.

+2  A: 

As the article author mentions, and @getekha highlights, thread pools in Python don't accomplish exactly the same thing as they do in other languages. If you need parallelism, you should look into the multiprocessing module. Among other things, it has these handy Queue and Pool constructs. Also, there's an accepted PEP for "futures" that you'll probably want to monitor.

Hank Gay
I will check it and code some sample to see what I can accomplish with it, thanks!
jahmax
+1  A: 

The problem is that Python has a Global Interpreter Lock, which must be held to run any Python code. This means that only one thread can execute Python code at any time, so thread pools in Python are not the same as in other languages. This is mainly for arcane reasons known only to a select few (i.e. it's complicated).

If you really want to run code asynchronously, you should spawn new Processes; the multiprocesssing module has a Pool class which you could look into.

katrielalex