views:

1177

answers:

2

Hi all

In an embedded linux environment (customized 2.4.25 on PowerPC) I get the following kernel panic after some hours:

Oops: kernel access of bad area, sig: 11
NIP: C9471C7C XER: 20000000 LR: C0018C74 SP: C0198E20 REGS: c0198d70 TRAP: 0800    Not tainted
MSR: 00009030 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DEAR: C9876FFF, ESR: 00000000
TASK = c0197020[0] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
PLB0: bear= 0x48041040 acr=   0x00000000 besr=  0x00000000
PLB0 to OPB: bear= 0x00cc1000 besr0= 0x00000000 besr1= 0x00000000

GPR00: 00000000 C0198E20 C0197020 00000000 C016E494 000000C2 C01D0000 00000000
GPR08: C98701F0 C9876FFF 00008000 C768AE0F 24004022 1001B120 07FC9500 00000000
GPR16: 00000001 00000001 FFFFFFFF 007FFE00 00001032 00198EE0 00000000 C0004780
GPR24: C01D2F68 C01E0000 C0170000 C0170000 C01B0000 C9473870 00000000 C9473864
Call backtrace:
00000001 C0018C74 C0018A1C C0005E14 C0004780 C0022724 C0005D4C
C0005D60 C0002430 C01AE5BC C0002328
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
 <0>Rebooting in 1 seconds...

cat /proc/modules:

CustomModule1          10556   4
CustomModule2           5488   0
CustomModule3          10240   1
fuse                   35576   4
usb-storage            28468   0 (unused)
keybdev                 3076   0 (unused)
mousedev                6116   0 (unused)
hid                    17968   0 (unused)
input                   6192   0 [keybdev mouse

ksyms -m:

Address   Symbol                 Defined by
c9471000  (11k)                  [CustomModule1]
c9471b74  functionA              [CustomModule1]
c947358c  functionB              [CustomModule1]
c9473580  functionC              [CustomModule1]
...

I googled for help but I could not find something useful. A also wanted to 'decode' the backtrace, but I dont understand how.... the addresses do not correspond to the addresses in System.map. Can anyone explain me how to find out the error?

Thanks, chris

+1  A: 

Is the configuration options CONFIG_KALLSYMS available on this kernel ? If it is and you can recompile your kernel, you should get an oops whith symbolic information.

As pointed by the comment, linux 2.4 does not have kallsyms, so you should enable frame_pointer and CONFIG_DEBUG. The backtrace and system.map should are virtual address, and match. They might not match exactly, but you can find which symbol is the nearest.

For example, in the backtrace outptut : C0018C74 and C0018A1C look like kernel code address, but C9xxxxxx range does not look like kernel address to me. Is it where kernel modules could be linked ?

please post some lines at the end of the system.map

Edit : From your ksyms output, it seems that the fault occurs in FunctionA of your custom module, because NIP C9471C7C is right after c9471b74 and :

  • NIP stands for Next Instruction Pointer
  • c9471b74 is the start address of FunctionA according to your ksyms output.

LR is the link register, ie usually the register where the return address is stored.

shodanex
Thanks for your answer... I will try that. Will take some time...
Chris
I guess it is not available. It does not appear when running `make config`. I also entered it manually in the config file, but this did not change the behaviour, the Oops message looks still the same...
Chris
You can decode by hand if you have the System.map available.
stsquad
Linux 2.4.x doesn't have KALLSYMS. Check that your .config file contains CONFIG_DEBUG_KERNEL=y and CONFIG_FRAME_POINTER=y. If not, enables these in the 'Kernel hacking' section of menuconfig
ctuffli
@stsquad: How? The addresses do not correspond... how do I find out the offset?
Chris
@ctuffli: CONFIG_DEBUG_KERNEL is available and was not enabled, the other one is not present. What will be different when I enable this one?
Chris
CONFIG_FRAME_POINTER is useless on powerp pc I think
shodanex
CONFIG_DEBUG_KERNEL should add debug symbols to the kernel allowing the oops to print out more human friendly information. This will also allow debugging tools such as gdb to help.
ctuffli
+1  A: 

NIP is the Next Instruction Pointer or more generically the Program Counter (a.k.a PC) and indicates where the kernel oops'd. According to the output of ksyms, the contents of NIP (0xC9471C7C) looks to be in functionA. You should be able to use objdump -S on the functionA module and figure out what instruction is at functionA+0x108.

The Link Register (LR) holds the return address of the current function and indicates the caller of functionA. You can either look in the System.map file to find the function containing this address or use the GNU binutils program addr2line on your vmlinux image to get the same information. From there, you should be able to get a better idea of what caused the oops.

See here and here for more information on the PPC registers and assembly.

ctuffli
Thanks you for this helpful answer. For the time being, this is what I wanted to know, therefore I will mark this answer as accepted.
Chris