views:

742

answers:

3

This can be a very simple question, I'm am attempting to debug an application which generates the following segfault error in the kern.log

kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]

Here are my questions:

  1. Is there any documentation as to what are the diff error numbers on segfault, in this instance it is error 6, but i've seen error 4, 5

  2. What is the meaning of the information at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]?

So far i was able to compile with symbols, and when i do a x 0x8048000+24000 it returns a symbol, is that the correct way of doing it? My assumptions thus far are the following:

  • sp = stack pointer?
  • ip = instruction pointer
  • at = ????
  • myapp[8048000+24000] = address of symbol?
+4  A: 

Based on my limited knowledge, your assumptions are correct.

  • sp = stack pointer
  • ip = instruction pointer
  • myapp[8048000+24000] = address

If I were debugging the problem I would modify the code to produce a core dump or log a stack backtrace on the crash. You might also run the program under (or attach) GDB.

The error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in arch/*/mm/fault.c in the kernel source. My copy of Linux/arch/i386/mm/fault.c has the following definition for error_code:

  • bit 0 == 0 means no page found, 1 means protection fault
  • bit 1 == 0 means read, 1 means write
  • bit 2 == 0 means kernel, 1 means user-mode

My copy of Linux/arch/x86_64/mm/fault.c adds the following:

  • bit 3 == 1 means fault was an instruction fetch
jschmier
Beat me to it :)
David Titarenco
The issue i have is that: 1) The application is segfaulting in a production environment, where symbols are stripped, all i have is just the logs2) I'm trying to find that memory location in the development env, so at least i can see where it is crashing.
Sullenx
If you have the pre-stripped binary, try running it through nm or objdump.
jschmier
nm is pretty helpful, at least I have an idea where the crash happened. One last thing, what is an error 6? ... is there any table out there?
Sullenx
I updated my answer to include the error code.
jschmier
segfault at 794ef0 ... sp 794ef0 - stack is obviously corrupted.
Nikolai N Fetissov
Thank you, this is very helpful
Sullenx
A: 

1) Compile your application with -g -ggdb -rdynamic
2) Run gdb <binary>
3) Type run
4) Get your program to segfault and type bt
5) ???
6) Profit

However, if you insist on interpreting the kern.log file... The ip (instruction pointer) address points to a general memory location in a .map file that you can produce at link time like so: -Wl -Map out.map. However, things can get more confusing depending on how you're linking libraries (static, dynamic)..

David Titarenco
The application is compiled dynamic, we set the ulimit in the kernel to dump a core file, but it's not doing so. Maybe because it is an error 6?
Sullenx
@Sullenx, more likely because disk space is not available for the core
Nikolai N Fetissov
+2  A: 

When the report points to a program, not a shared library

Run addr2line -e myapp 080513b (and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

If it's a shared library

You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact. Reproduce the problem under gdb.

What the error means

Here's the breakdown of the fields:

  • address - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)
  • ip - instruction pointer, ie. where the code which is trying to do this lives
  • sp - stack pointer
  • error - value of errno, ie. last error code which was reported by a syscall; the values for low-numbered errors are standardized across all UNIXlike operating systems, and they're easily looked up in the system headers (errno.h and its ilk).
Charles Duffy