I am trying to use python's multiprocessing library to hopefully gain some performance. Specifically I am using its map function. Now, for some reason when I swap it out with its single processed counterpart I don't get any memory leaks over time. But using the multiprocessing version of map causes my memory to go through the roof. For the record I am doing something which can easily hog up loads of memory, but what would the difference be between the two to cause such a stark difference?
You realize that multiprocessing does not use threads, yes? I say this because you mention a "single threaded counterpart".
Are you sending a lot of data through multiprocessing
's map
? A likely cause is the serialization multiprocessing has to do in many cases. multiprocessing
uses pickle
, which does typically take up more memory than the data it's pickling. (In some cases, specifically on systems with fork()
where new processes are created when you call the map
method, it can avoid the serialization, but whenever it needs to send new data to existing process it cannot do so.)
Since with multiprocessing
all of the actual work is done in separate processes, the memory of your main process should not be affected by the actual operations you perform. The total use of memory does go up by quite a bit, however, because each worker process has a copy of the data you sent across. This is sometimes copy-on-write memory (in the same cases as not serializing) on systems that have CoW, but Python's use of memory is such that this quickly becomes written to, and thus copied.