ansaurus

Question

How to debug a segmentation fault while the gdb stack trace is full of '??' ?

Answer 1

A:

Try running with Valgrind memory debugger.

Tronic 2010-03-10 17:34:13

Valgrind do not ouput so much thing. Well there is only 2 invalid write , and they live from quite a long time in a code that is old enough... Probably you'll suggest me to correct them anyway .. And you would be right of course

yves Baumes 2010-03-10 18:10:13

Answer 2

A:

To confirm, was your executable compiled in release mode, i.e. no debug symbols....that could explain why there's ?? Try recompiling with -g switch which 'includes debugging information and embedding it into the executable'..Other than that, I am out of ideas as to why you have '??'...

tommieb75 2010-03-10 17:36:10

Even if you compile with -g sometimes a -O2 will cause ?? to show up (certainly not for ALL levels of the stack though).

Mark B 2010-03-10 17:47:44

It will appear for some levels when using library calls to stripped libraries, at least.

Tronic 2010-03-10 18:06:33

Hi, nm --demangle showsup many methods from my program. We are compiling it in release mode (-O2 option at least) but the stack trace shows only one line for each threads. For instance:#10 process 11460 °x44bf7a2 in ?? ().. #2 process 11471 0x004bf7a2 in ?? ()#1 process 11465 0x00243959 in ?? ()

yves Baumes 2010-03-10 18:07:25

Answer 3

A:

I assume that since you say "My executable contains symbol table" that you compiled and linked with -g, and that your binary wasn't stripped.

We can just confirm this: strings -a |grep function_name_you_know_should_exist

Also try using pstack on the core ans see if it does a better job of picking up the callstack. In that case it sounds like your gdb is out of date compared to your gcc/g++ version.

Mark B 2010-03-10 17:57:11

Answer 4

A:

Not really. Sure you can dig around in memory and look at things. But without a stack trace you don't know how you got to where you are or what the parameter values were.

However, the very fact that your stack is corrupt tells you that you need to look for code that writes into the stack.

Overwriting a stack array. This can be done the obvious way or by calling a function or system call with bad size arguments or pointers of the wrong type.
Using a pointer or reference to a function's local stack variables after that function has returned.
Casting a pointer to a stack value to a pointer of the wrong size and using it.

If you have a Unix system, "valgrind" is a good tool for finding some of these problems.

Zan Lynx 2010-03-10 18:19:21

Answer 5

+1 A:

I am a C++ programmer for a living and I have encountered this issue more times than i like to admit. Your application is smashing HUGE part of the stack. Chances are the function that is corrupting the stack is also crashing on return. The reason why is because the return address has been overwritten, and this is why GDB's stack trace is messed up.

This is how I debug this issue:

1)Step though the application until it crashes. (Look for a function that is crashing on return).

2)Once you have identified the function, declare a variable at the VERY FIRST LINE of the function:

int canary=0;

(The reason why it must be the first line is that this value must be at the very top of the stack. This "canary" will be overwritten before the function's return address.)

3) Put a variable watch on canary, step though the function and when canary!=0, then you have found your buffer overflow! Another possibility it to put a variable breakpoint for when canary!=0 and just run the program normally, this is a little easier but not all IDE's support variable breakpoints.

EDIT: I have talked to a senior programmer at my office and in order to understand the core dump you need to resolve the memory addresses it has. One way to figure out these addresses is to look at the MAP file for the binary, which is human readable. Here is an example of generating a MAP file using gcc:

gcc -o foo -Wl,-Map,foo.map foo.c

This is a piece of the puzzle, but it will still be very difficult to obtain the address of function that is crashing. If you are running this application on a modern platform then ASLR will probably make the addresses in the core dump useless. Some implementation of ASLR will randomize the function addresses of your binary which makes the core dump absolutely worthless.

Rook 2010-03-10 19:17:07

thank you for your reply Rook, that is my first guess too. Unfortunately the coredump occured in production environment. And trying to reproduce in test environment with full debug information one day long gave no result. It is really a frustrating situation.My bosses are quite disapointed. I would like to find a tool that help to inspect the heap memory. Maybe it could give some clues ?

yves Baumes 2010-03-10 19:31:05

@yves Its a stack overflow for sure because the return addresses are being corrupted which only exist on the stack. Debugging this from a core dump is going to be a cast iron bitch. Another trick you could use is to change the value of variables while the program is at a break point. You might be able to get these values from the core dump, but its a slim chance. You will probably need more information than a dump of corrupted memory that doesn't even have a valid stack trace.

Rook 2010-03-10 19:36:05

GDB now has reverse debugging, it was introduced in September and most IDE's still don't support it. It could be useful for debugging this issue if it happens again, you could attach the debugger and when it crashes you can "step back" to see how it happened.

Rook 2010-03-10 19:44:08

Answer 6

A:

You have to use some debugger to detect, valgrind is ok
while you are compiling your code make sure you add -Wall option, it makes compiler will tell you if there are some mistakes or not (make sure you done have any warning in your code).

ex: gcc -Wall -g -c -o oke.o oke.c
3. Make sure you also have -g option to produce debugging information. You can call debugging information using some macros. The following macros are very useful for me:

__LINE__ : tells you the line

__FILE__ : tells you the source file

__func__ : tells yout the function

Using the debugger is not enough I think, you should get used to to maximize compiler ablity.

Hope this would help

deddihp 2010-03-11 07:28:40

Answer 7

A:

sounds like you're not using the identical glibc version on your machine as the corefile was when it crashed on production. get the files output by "ldd ./appname" and load them onto your machine, then tell gdb where to look "set solib-absolute-prefix /path/to/libs" :)

sbester 2010-05-31 14:44:34

ansaurus

tags:

views:

answers:

How to debug a segmentation fault while the gdb stack trace is full of '??' ?

related questions