views:

507

answers:

4

What can cause SIGBUS (bus error) on a generic x86 userland application in Linux? All of the discussion I've been able to find online is regarding memory alignment errors, which from what I understand doesn't really apply to x86.

(My code is running on a Geode, in case there are any relevant processor-specific quirks there.)

A: 

A common cause of a bus error on x86 Linux is attempting to dereference something that is not really a pointer, or is a wild pointer. For example, failing to initialize a pointer, or assigning an arbitrary integer to a pointer and then attempting to dereference it will normally produce either a segmentation fault or a bus error.

Alignment does apply to x86. Even though memory on an x86 is byte-addressable (so you can have a char pointer to any address), if you have for example an pointer to a 4-byte integer, that pointer must be aligned.

You should run your program in gdb and determine which pointer access is generating the bus error to diagnose the issue.

Tyler McHenry
Unaligned access of integers works on x86.
Joshua
+3  A: 

You can get a SIGBUS from an unaligned access if you turn on the unaligned access trap, but normally that's off on an x86. You can also get it from accessing a memory mapped device if there's an error of some kind.

Your best bet is using a debugger to identify the faulting instruction (SIGBUS is synchronous), and trying to see what it was trying to do.

Chris Dodd
The debugger showed that the SIGBUS occurred immediately upon entering the function. Maybe I have some memory corruption, or maybe one of the function parameters is bad? I'll have to check the disassembly in the debugger for more details if the error occurs again.
Josh Kelley
@Josh -- check to see what the actual failing instruction is -- if its a push or pop, then your stack pointer is corrupted. If its something else, then the address in the instruction is the issue.
Chris Dodd
+2  A: 

SIGBUS can happen in Linux for quite a few reasons other than memory alignment faults - for example, if you attempt to access an mmap region beyond the end of the mapped file.

Are you using anything like mmap, shared memory regions, or similar?

caf
Yes, we're using shared memory regions. I'll investigate that possibility the next time this error comes up. Thanks.
Josh Kelley
+1  A: 

Oh yes there's one more weird way to get SIGBUS.

If the kernel fails to page in a code page due to memory pressure (OOM killer must be disabled) or failed IO request, SIGBUS.

Joshua