views:

91

answers:

2

My application segfaults sometimes and mainly in malloc() and malloc_consolidate() when I look at the backtrace in gdb.

I verified that the machine has enough memory available, it didn't even start swapping. I checked ulimits for data segement and max memory size and both are set to 'unlimited'. I also ran the application under valgrind and didn't find any memory errors.

Now I'm out of ideas what else might be causing these segfaults. Any Ideas ?

Update: Since I'm not finding anything with valgrind (or ptrcheck), could it be that another application is trashing libc's memory structure or is there a separate structure for each process ?

+3  A: 

Most likely, you're trashing the heap -- i.e., you're writing beyond the limits of a piece of memory you allocated, and this is overwriting the data structures that malloc() uses to manage the heap. This causes malloc() to access an invalid address, and your application crashes.

Running out of memory would not cause malloc() to crash -- it would simply return NULL. That might cause your code to crash if you're not checking for NULL, but the crash site would not be in malloc().

It's slightly strange that Valgrind is not reporting any errors -- but there are some errors that the default "Memcheck" tool can miss. Try running Valgrid with the "Ptrcheck" tool instead.

Martin B
But shouldn't this have shown up under valgrind ? (Assuming my test coverage was good enough.)
Gene Vincent
Your comment seems to have overlapped with my edit -- as suggested there, try running Valgrind with the "Ptrcheck" tool. If malloc() crashes, it's almost certain you're trashing the heap in some way.
Martin B
+1  A: 

"Another possibility to check for and guard against bugs in the use of malloc, realloc and free is to set the environment variable MALLOC_CHECK_. When MALLOC_CHECK_ is set, a special (less efficient) implementation is used which is designed to be tolerant against simple errors, such as double calls of free with the same argument, or overruns of a single byte (off-by-one bugs). Not all such errors can be protected against, however, and memory leaks can result. If MALLOC_CHECK_ is set to 0, any detected heap corruption is silently ignored; if set to 1, a diagnostic is printed on stderr; if set to 2, abort is called immediately. This can be useful because otherwise a crash may happen much later, and the true cause for the problem is then very hard to track down."

http://www.gnu.org/s/libc/manual/html_node/Heap-Consistency-Checking.html#Heap-Consistency-Checking

BillTorpey