views:

90

answers:

1

Hi

I used python multiprocessing and do wait of all processes with this code:

...
            results = []
            for i in range(num_extract):
                url = queue.get(timeout=5)
                try:
                    print "START PROCESS!"
                    result = pool.apply_async(process, [host,url],callback=callback)
                    results.append(result)
                except Exception,e:

                    continue


            for r in results:
                r.get(timeout=7)
...

i try to use pool.join but get error:

Traceback (most recent call last):
  File "C:\workspace\sdl\lxchg\walker4.py", line 163, in <module>
    pool.join()
  File "C:\Python25\Lib\site-packages\multiprocessing\pool.py", line 338, in joi
n
    assert self._state in (CLOSE, TERMINATE)
AssertionError

Why join dont work? And what is the good way to wait all processes.

My second question is how can i restart certain process in pool? i need this in the reason of memory leak. Now In fact i do rebuild of all pool after all processes done their tasks (create new object pool to do process restarting).

What i need: for example i have 4 process in pool. Then process get his task, after task is done i need to kill process and start new (to refresh memory leak).

+1  A: 

You are getting the error because you need to call pool.close() before calling pool.join()

I don't know of a good way to shut down a process started with apply_async but see if properly shutting down the pool doesn't make your memory leak go away.

The reason I think this is that the Pool class has a bunch of attributes that are threads running in daemon mode. All of these threads get cleaned up by the join method. The code you have now won't clean them up so if you create a new Pool, you'll still have all those threads running from the last one.

aaronasterling
thnx with close it works.
Evg
about pool i mean it seem that pool use processes create at start to the end. My script long time in work and with time all processes of pool begin to grow in memory, i want to reset memory usage of process from time to time (each new task for process) and restarting of process is one way i can do it.. i think
Evg
sorry i mean not memory leak i mean about simple growing in memory of process and controlling this growing via process restarting.
Evg
@Evg. If the new process could accomplish the same task as the old process but with less memory, then it seems like you have a memory leak in the process itself rather than in the pool. That's a separate issue. Briefly though, you would want to check to see if you are creating any cycles in the processes. If so, be sure that you `del` the members of the cycle when you are through with them so that the garbage collector can reclaim the space.
aaronasterling
yes it in process passed in apply_async, process fetching data from web it needed memory different for each task, thats why i need create new process(recreate of process in pool) for each task (too free memore from prev task consumed). I don't think gc can helps here i try to use it one way - don't helps, need something more global such as recreate all task process.
Evg
other words my pool start with 4 processes each at start consumed near 2 mb of ram. After each process do thausends of tasks it will graw to 100mb. thats why i need restarting for each new task.
Evg
@Evg. What you are describing is a _text book example_ of a memory leak. Restarting the processes will _hide_ it but not _fix it_. _fix_ the memory leak in the processes and you will not have a problem.
aaronasterling
i use some frameworks and it leak can be in them, and i have now time at this moment to explore the code.. i need fast solution, what you mean "hide" not "fix"? then i restart process i free all it mem and garanty free mem without leaks..isn't it?
Evg