views:

123

answers:

1

I am trying to use python's multiprocessing library to hopefully gain some performance. Specifically I am using its map function. Now, for some reason when I swap it out with its single processed counterpart I don't get any memory leaks over time. But using the multiprocessing version of map causes my memory to go through the roof. For the record I am doing something which can easily hog up loads of memory, but what would the difference be between the two to cause such a stark difference?

+3  A: 

You realize that multiprocessing does not use threads, yes? I say this because you mention a "single threaded counterpart".

Are you sending a lot of data through multiprocessing's map? A likely cause is the serialization multiprocessing has to do in many cases. multiprocessing uses pickle, which does typically take up more memory than the data it's pickling. (In some cases, specifically on systems with fork() where new processes are created when you call the map method, it can avoid the serialization, but whenever it needs to send new data to existing process it cannot do so.)

Since with multiprocessing all of the actual work is done in separate processes, the memory of your main process should not be affected by the actual operations you perform. The total use of memory does go up by quite a bit, however, because each worker process has a copy of the data you sent across. This is sometimes copy-on-write memory (in the same cases as not serializing) on systems that have CoW, but Python's use of memory is such that this quickly becomes written to, and thus copied.

Thomas Wouters
Right, sorry about that I do know that multiprocess doesn't in fact use threads. (Hence the name)So sending the information over the pipe is what is killing it. Makes lots of sense. Do you know of any solutions to the problem that I'm facing?
Sandro
Send over less data. Or, send it over in smaller chunks. Or, if you're on a system with fork(), make it so the serialization doesn't happen: make sure multiprocessing will start new processes.
Thomas Wouters