views:

577

answers:

5

hello,

i'm looking at a core from a process running in Unix. Usually I can work my around and root into the backtrace to try identify a memory issue. In this case, I'm not sure how to proceed.

Firstly the backtrace only gives 3 frames where I would expect alot more. For those frames, all the function parameters presented appears to completely invalid. There are not what I would expect.

Some pointer parameters have the following associated with them - Cannot access memory at address

Would this suggest some kind of complete stack corruption. I ran the process with libumem and all the buffers were reported as being clean.

umem_status reported nothing either.

so basically I'm stumped. What is the likely causes? What should I look for in code since libumem appears to have reported no errors.

Any suggestions on how I can debug furhter? any extra features in mdb I should consider?

thank you.

+3  A: 

You can look into if using Valgrind or ElectricFence could break a little earlier for you.

epatel
+4  A: 

Stack corruption does sound like a possibility. Some things to try:

  • Turn on all compiler warnings that you can!
  • Run lint!
  • If possible, try building & testing your program on OpenBSD which has a lot of memory corruption detection built-in.
  • If possible, use some tools like ProPolice, StackGuard, et al.
  • If you can reproduce this problem easily, it's worth playing around in the debugger. Narrow it down as much as possible and then step through.
dwc
A: 

shouldnt libumem report the same overrun as electric fence?

can't reproduce easy in test environment but in commercial env under unix/solaris the core occurs but libumem shows nothing bad,

A: 

Your code? When this happens to me, I always find the same thing: A null pointer. Looks horrible when it crashes, but the cause is ultimately simple.

gbarry
A: 

I did run into similar issue. The backtrace from GDB was not helpful. Valgrind came to my rescue.

Run your application through Valgrind. Identify all the errors like invalid writes. Analyze the piece of code and see if they can be fixed.

In my case I was attempting a invalid write(which at times might write NULL) which showed its effect not at that instance but else where.

Prabhu. S