Hello,
I have the following code written to make my lazy second CPU core working. What the code does basically is first find the desired "sea" files in the directory hierarchy and later execute set of external scripts to process these binary "sea" files to produce 50 to 100 text and binary files in number. As the title of the question suggest in a paralleled fashion to increase the processing speed.
This question originates from the long discussion that we have been having on IPython users list titled as "Cannot start ipcluster". Starting with my experimentation on IPython's parallel processing functionalities.
The issue is I can't get this code running correctly. If the folders that contain "sea" files only houses "sea" files the script finishes its execution without fully performing external script runs. (Say I have 30-50 external scripts to run, but my multiprocessing enabled script exhaust only after executing the first script in these external script chain.) Interestingly, if I run this script on an already processed folder (which is "sea" files processed beforehand and output files are already in that folder) then it runs, but this time I get speed-ups at about 2.4 to 2.7X with respect to linear processing timings. It is not very expected since I only have a Core 2 Duo 2.5 Ghz CPU in my laptop. Although I have a CUDA powered GPU it has nothing to do with my current parallel computing struggle :)
What do you think might be source of this issue?
Thank you for all comments and suggestions.
#!/usr/bin/env python
from multiprocessing import Pool
from subprocess import call
import os
def find_sea_files():
file_list, path_list = [], []
init = os.getcwd()
for root, dirs, files in os.walk('.'):
dirs.sort()
for file in files:
if file.endswith('.sea'):
file_list.append(file)
os.chdir(root)
path_list.append(os.getcwd())
os.chdir(init)
return file_list, path_list
def process_all(pf):
os.chdir(pf[0])
call(['postprocessing_saudi', pf[1]])
if __name__ == '__main__':
pool = Pool(processes=2) # start 2 worker processes
files, paths = find_sea_files()
pathfile = [[paths[i],files[i]] for i in range(len(files))]
pool.map(process_all, pathfile)