tags:

views:

379

answers:

8
A: 

Have you tried using lint, flexlint or cppcheck. These may help identify a problem.

If you know what area of memory is being corrupted have you tried marking this memory as protected. This may mask your problem and not help at all but if it still crashes the point at where the memory is modified will help resolve your problem.

Dave
How can you protect the memory? I never did this before...
bbazso
+3  A: 

In my experience, coverity and purify have founds such kind of errors than valgrind didn't (in fact all found problems which weren't seen by the others).

But sometimes no tool give an hint and you have to dig more, add instrumentation, play with breakpoints on "modify memory at address", try to simply the testcase which fails and so on to find out the root cause. That's can be very painful.

AProgrammer
I agree with the debugger comment. Sometimes it's just best to set a break point on the memory that gets blown away and wait.
caspin
With gdb, how can I set a breakpoint on memory to see when it's written to?
bbazso
the `watch` command, potentially along with a condition on what you consider invalid
Greg Rogers
+1  A: 

You didn't specify the platform, but I can recommend Gimpel PC-lint as an excellent static analysis tool (don't be fooled by the name!). They also offer FlexeLint for other platforms, but I have no personal experience of that product.

Max
Gimpel Software is a great outfit. +1
Norman Ramsey
+2  A: 

I assume you're using valgrind's memcheck tool, which is what it is famous for. Since you are using valgrind already you might also try running your program through valgrind --tool=exp-ptrcheck, which is an experimental tool that is designed to catch certain types of errors that memcheck will miss, including access checks for stack and global arrays, and use of pointers that happen to point to a valid object but not the object that was intended. It does this by using a completely different mechanism, essentially tracking each pointer into memory rather than tracking the memory itself, and through use of heuristics.

Be aware that the tool is definitely experimental, and I've found that it currently reports a number of false positives, but you may find that it catches something significant. Also only Linux is supported so far (no Mac OS X yet), and you should ensure that you are using the latest valgrind version (currently 3.5.0).

mark4o
I tried it out, but I keep getting a:sysno == 233exp-ptrcheck: the 'impossible' happened: unhandled syscallAny ideas?
bbazso
Ok, syscall 233 is `llistxattr` on i386 or `epoll_ctl` on x86_64, which it looks like is not handled :(. Since these don't return pointers you could just add those to `exp-ptrcheck/h_main.c` `setup_post_syscall_table`: `ADD(0, __NR_llistxattr);` `ADD(0, __NR_epoll_ctl);`, although you may then run into others.
mark4o
Tried it out and then it moved down one to 232. :) Where did you find the mapping between syscall no and what it is?
bbazso
`/usr/include/asm-i386/unistd.h` (or `asm-x86_64` for x86_64). I guess you can see why it is still experimental...
mark4o
+1  A: 

Is it possible some stack corruption is occurring? If so, try enabling stack canaries with the -fstack-protector-all option, assuming you are using g++.

Other than that, have you cranked up warning flags to help identify suspicious code?

Void
I tried the flag -fstack-protector-all and I also tried libsafe and both came up empty handed ? :(
bbazso
A: 

If valgrind can identify the bad pointer being passed to free(), you could try running the program under DDD, which can set a hardware watchpoing on the memory location and halt the program when it is getting a bad value. If the pointer is getting changed a lot you may have to write some code around malloc and free to keep track of which values are good and bad.

Norman Ramsey
+2  A: 

My experience is that often this sort of problem is caused by a heap overflow. Electric Fence is a relatively simple allocation debugging tool I like to use. Its main use is as a dynamic analysis tool to check for heap overflows, a complement to "-fstack-protector-all" which checks for stack overflows.

More links to efence stuff.

Managu
+1  A: 

In my opinion, using a debugger with "reverse debugging" capabilities could help. You would be able to step back in time and hopefully find out what was the real source of the problem.

Here are a couple of links:

http://www.gnu.org/software/gdb/news/reversible.html

http://undo-software.com/ (which apparently is free for non-commercial applications)

Hugo