ansaurus

Question

Answer 1

+2 A:

There are several things I can think of:

1) Have you printed out the pathfiles? Are you sure that they are all properly generated?

a) I ask as your os.walk is a bit interesting; the dirs.sort() should be ok, but seems quite unncessarily. os.chdir() in general shouldn't be used; the restoration should be alright, but in general you should just be appending root to init.

2) I've seen multiprocessing on python2.6 have problems spawning subporcesses from pools. (I specifically had a script use multiprocessing to spawn subprocesses. Those subprocesses then could not correctly use multiprocessing (the pool locked up)). Try python2.5 w/ the mulitprocessing backport.

3) Try picloud's cloud.mp module (which wraps multiprocessing, but handles pools a tad differently) and see if that works.

You would do

cloud.mp.join(cloud.mp.map(process_all, pathfile))

(Disclaimer: I am one of the developers of PiCloud)

UsAaR33 2009-12-02 05:13:11

1) Yes, the paths are all correct. I test the same paths using IPython's parallel processing features and the main script goes through the folders and execute that mention scripts chain properly to fully post-process the sea files.a) sort() is a work-around for my Fedora's file-system (ext4). Without it os.walk() visits the folders arbitrarily. 2) I will take a look at 2.5. Right now I use 2.6.03) Thanks for the suggestion. I will give it a try, however I tend to not add external requirements for others to easily execute the script.

Gökhan Sever 2009-12-25 17:20:13

Answer 2

+1 A:

I would start with getting a better feeling for what is going on with the worker process. The multiprocessing module comes with logging for its subprocesses if you need. Since you have simplified the code to narrow down the problem, I would just debug with a few print statements, like so (or you can PrettyPrint the pf array):


def process_all(pf):
   print "PID: ", os.getpid()
   print "Script Dir: ", pf[0]
   print "Script: ", pf[1]
   os.chdir(pf[0])
   call(['postprocessing_saudi', pf[1]])


if __name__ == '__main__':
   pool = Pool(processes=2)
   files, paths = find_sea_files()
   pathfile = [[paths[i],files[i]] for i in range(len(files))]
   pool.map(process_all, pathfile, 1) # Ensure the chunk size is 1
   pool.close()
   pool.join()

The version of Python that I have accomplished this with 2.6.4.

Eric Lubow 2009-12-21 18:21:30

The script visits the first directory and without starting the external processing process the script goes to the 2nd dir and try to process the first file from within there[gsever@ccn partest]$ python proall3.py PID: 10723Script Dir: /home/gsever/Desktop/partest/20090317_131342/PostProcessingScript: 09_03_17_13_13_42.seaPID: 10724Script Dir: /home/gsever/Desktop/partest/20090318_075533/PostProcessingScript: 09_03_18_07_55_33.seaProcessing the 09_03_18_07_55_33.sea file ....................... Processing the 09_03_17_13_13_42.sea file ....................... Done

Gökhan Sever 2009-12-25 17:26:44

Again the execution fails without even properly applying the external script chain on each sea files. I still think the issue is related to Python's multiprocessing module. I would be glad to hear some more comments to figure out the exact problem.

Gökhan Sever 2009-12-25 17:31:25

ansaurus

tags:

views:

answers:

Using multiprocessing pool of workers

related questions