views:

100

answers:

3

I am using SUSE 10 Linux on a machine with 16 G ram and 2 quad core CPUs. There are 8 processes which are doing some work (CPU intensive/network i/o). Out of which 4 have a memory leak (These are test conditions so no problem in having leaks here). Total space is occupied by all processes is around 15.4 G only 200 MB is free in system. Things are fine for some hours. But after that malloc hangs (for a process which doesn't have a memory leak). Its stuck for more than 4 minutes (Note CPU is not 100% but io has gone up signficantly). Now there is no problem in the hanged process (it has not corrupted the memory). What is malloc doing? (is it trying to defragment or building up swap space).

Any pointers?

+1  A: 

It might be annoying, but I would recommend using Valgrind on the process that blocks. There might be errors you didn't detect before. At least, you might have an idea of what is happening. However, the few hours might become days :/

PierreBdR
Already tried. VALGRIND makes my process to run slower at least 100 times and this simulation will take years to complete.
Rahul
To use valgrind and find allocation errors you can easily run your application on a smaller data set, or divide it in smaller chunks that are tested individually.
Jens Gustedt
+3  A: 

If malloc() simply takes a long time, you're probably traversing a fragmented free list, many of whose entries have been swapped out. That is consistent with low CPU, high IO, and limited free RAM.

For more information on malloc() implementations (including understanding fragmented free lists), the Wikipedia article is good: http://en.wikipedia.org/wiki/Malloc#Implementations

Oh, and memory leaks aren't acceptable, even in a test environment. As you can see, they're interfering with programs that (as far as you know) don't have leaks, and costing you time.

Anon
+1  A: 

Before you machine was just short on life RAM. Now your malloc goes beyond the 16G limit of your machine and your system starts swapping. But checking your application as hinted by PierreBdR is certainly a good idea.

Jens Gustedt