views:

131

answers:

4

Hey all-

I'm using ParallelPython to develop a performance-critical script. I'd like to share one value between the 8 processes running on the system. Please excuse the trivial example but this illustrates my question.

def findMin(listOfElements):
    for el in listOfElements:
        if el < min:
             min = el

import pp
min = 0
myList = range(100000)
job_server = pp.Server()
f1 = job_server.submit(findMin, myList[0:25000])
f2 = job_server.submit(findMin, myList[25000:50000])
f3 = job_server.submit(findMin, myList[50000:75000]) 
f4 = job_server.submit(findMin, myList[75000:100000]) 

The pp docs don't seem to describe a way to share data across processes. Is it possible?

If so, is there a standard locking mechanism (like in the threading module) to confirm that only one update is done at a time?

l = Lock()
if(el < min):
     l.acquire
     if(el < min):
         min = el
     l.release

I understand I could keep a local min and compare the 4 in the main thread once returned, but by sharing the value I can do some better pruning of my BFS binary tree and potentially save a lot of loop iterations.

Thanks-

Jonathan

A: 

I'm not sure about the PP module but you could always store the lowest value in a scratch file. My only worry would be that you would spend the majority of your time acquiring and releasing the lock. The only exception would be if your el < min operation is time-consuming.

I would actually say that your "merging" technique is probably the way to go.

And btw, I understand you're giving a simple example of your code for brevity, but don't use min as a variable name ... it'll cause you a lot of head-aches while debugging.

JudoWill
+1  A: 

Actually, there is an example at http://www.parallelpython.com/content/view/17/31/#CALLBACK and they simply use the locks from the thread module.

Like JudoWill pointed out, make sure to experiment with how often you should sync the global min in your jobs. If you do it every time you may end up close to serializing your whole calculation.

clacke
A: 

You won't save any iterations by sharing the value, you need to read at least once every element from the list. Moreover, it will be slower, as you need to lock everytime you use the shared value.

In your case, if you want more performance, you should compute a min for each part separately and the compare these results in the main thread.

On the other hand, passing the list to other processes might be more ressource consumming than finding the minimum value of the list in a single pass.

Chris
+1  A: 

Parallel Python runs the sub-functions on different processes, so there is no shared memory which means you should not be using a shared value. The callback example mentioned by clackle takes the results of each function and combines them in a callback function which is operating in the original process. To use it properly you should do something similar; in the example given you would calculate local minimums and use a callback function to find the minimum of all the subresults. Hopefully in your real case you can do something similar.

Kathy Van Stone