Why does the linux kernel generate a segfault on stack overflow? This can make debugging very awkward when alloca in c or fortran creation of temporary arrays overflows. Surely it mjust be possible for the runtime to produce a more helpful error.
The "kernel" (it's actually not the kernel running your code, it's the CPU) doesn't know how your code is referencing the memory it's not supposed to be touching. It only knows that you tried to do it.
The code:
char *x = alloca(100);
char y = x[150];
can't really be evaluated by the CPU as you trying to access beyond the bounds of x.
You may hit the exact same address with:
char y = *((char*)(0xdeadbeef));
BTW, I would discourage the use of alloca since stack tends to be much more limited than heap (use malloc instead).
A stack overflow is a segmentation fault. As in you've broken the given bounds of memory that the you were initially allocated. The stack of of finite size, and you have exceeded it. You can read more about it at wikipedia
Additionally, one thing I've done for projects in the past is write my own signal handler to segfault (look at man page signal (2)). I usually caught the signal and wrote out "Fatal error has occured" to the console. I did some further stuff with checkpoint flags, and debugging.
In order to debug segfaults you can run a program in GDB. For example, the following C program will segfault: #segfault.c #include #include
int main()
{
printf("Starting\n");
void *foo=malloc(1000);
memcpy(foo, 0, 100); //this line will segfault
exit(0);
}
If I compile it like so:
gcc -g -o segfault segfault.c
and then run it like so:
$ gdb ./segfault
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) run
Starting program: /tmp/segfault
Starting
Program received signal SIGSEGV, Segmentation fault.
0x4ea43cbc in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x4ea43cbc in memcpy () from /lib/libc.so.6
#1 0x080484cb in main () at segfault.c:8
(gdb)
I find out from GDB that there was a segmentation fault on line 8. Of course there are more complex ways of handling stack overflows and other memory errors, but this will suffice.
You can actually catch the condition for a stack overflow using signal handlers.
To do this, you must do two things:
Setup a signal handler for SIGSEGV (the segfault) using sigaction, to do this set the SO_ONSTACK flag. This instructs the kernel to use an alternative stack when delivering the signal.
Call sigaltstack() to setup the alternate stack that the handler for SIGSEGV will use.
Then when you overflow the stack, the kernel will switch to your alternate stack before delivering the signal. Once in your signal handler, you can examine the address that caused the fault and determine if it was a stack overflow, or a regular fault.
A stack overflow does not necessarily yield a crash. It may silently trash data of your program but continue to execute.
I wouldn't use SIGSEGV handler kludges but instead fix the original problem.
If you want automated help, you can use gcc's -Wstack-protector option, which will spot some overflows at runtime and abort the program.
valgrind is good for dynamic memory allocation bugs, but not for stack errors.
Some of the comments are helpful, but the problem is not of memory allocation errors. That is there is no mistake in the code. It's quite a nuisance in fortran where the runtime allocates temporary values on the stack. Thus a command such as write(fp)x,y,z can trigger are segfault with no warning. The technical support for the intel Fortran compiler say that there is no way that the runtime library can print a more helpful message. However if Miguel is right than this should be possible as he suggests. So thanks a lot. The remaining question then is how do I firstly find the address of the seg fault and the figure out if it came from a stack overflow or some other problem.
For others who find this problem there is a compiler flag which puts temporary varibles above a certain size on the heap.