I have a numpy.array of 640x480 images, each of which is 630 images long. The total array is thus 630x480x640. I want to generate an average image, as well as compute the standard deviation for each pixel across all 630 images.
This is easily accomplished by
avg_image = numpy.mean(img_array, axis=0)
std_image = numpy.std(img_array, axis=0)
However, since I'm running this for 50 or so such arrays, and have a 8 core/16 thread workstation, I figured I'd get greedy and parallelize things with multiprocessing.Pool.
So I did the following:
def chunk_avg_map(chunk):
#do the processing
sig_avg = numpy.mean(chunk, axis=0)
sig_std = numpy.std(chunk, axis=0)
return([sig_avg, sig_std])
def chunk_avg(img_data):
#take each row of the image
chunks = [img_data[:,i,:] for i in range(len(img_data[0]))]
pool = multiprocessing.Pool()
result = pool.map(chunk_avg_map, chunks)
pool.close()
pool.join()
return result
However, I saw only a small speedup. By putting print statements in chunk_avg_map I was able to determine that only one or two processes are being launched at a time, rather than 16 (as I would expect).
I then ran my code through cProfile in iPython:
%prun current_image_anal.main()
The result indicated that by far the most time was spent in calls to acquire:
ncalls tottime percall cumtime percall filename:lineno(function)
1527 309.755 0.203 309.755 0.203 {built-in method acquire}
Which I understand to be something to do with locking, but I don't understand why my code would be doing that. Does anyone have any ideas?
[EDIT] As requested, here is a run-able script which demonstrates the problem. You can profile it by whatever means you like, but when I did I found that the lions share of the time was taken up with calls to acquire, rather than mean or std as I would have expected.
#!/usr/bin/python
import numpy
import multiprocessing
def main():
fake_images = numpy.random.randint(0,2**14,(630,480,640))
chunk_avg(fake_images)
def chunk_avg_map(chunk):
#do the processing
sig_avg = numpy.mean(chunk, axis=0)
sig_std = numpy.std(chunk, axis=0)
return([sig_avg, sig_std])
def chunk_avg(img_data):
#take each row of the image
chunks = [img_data[:,i,:] for i in range(len(img_data[0]))]
pool = multiprocessing.Pool()
result = pool.map(chunk_avg_map, chunks)
pool.close()
pool.join()
return result
if __name__ == "__main__":
main()