tags:

views:

3394

answers:

8

Why does C differentiates in case of array index out of bound

#include <stdio.h>
int main()
{
    int a[10];
    a[3]=4;
    a[11]=3;//does not give segmentation fault
    a[25]=4;//does not give segmentation fault
    a[20000]=3; //gives segmentation fault
    return 0;
}

I understand that it's trying to access memory allocated to process or thread in case of a[11] or a[25] and it's going out of stack bounds in case of a[20000], but why doesn't compiler or linker give an error, aren't they aware of the array size? If not then how does sizeof(a) work correctly?

+21  A: 

The problem is that C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.

In this particular case, you are declaring a stack based array. Depending upon the particular implementation, accessing outside the bounds of the array will simply access another part of the already allocated stack space (most OS's and threads reserve a certain portion of memory for stack). As long as you just happen to be playing around in the pre-allocated stack space, everything will not crash (note i did not say work).

What's happening on the last line is that you have now accessed beyond the part of memory that is allocated for the stack. As a result you are indexing into a part of memory that is not allocated to your process or is allocated in a read only fashion. The OS sees this and sends a seg fault to the process.

This is one of the reasons that C/C++ is so dangerous when it comes to boundary checking.

JaredPar
but why does not compiler or the linker give an error, arent they aware about the array size? if not then how does sizeof(a) works correctly?
Kazoom
@Kazoom, C can know if a very specific subset of array accesses are legal. But those far outweigh the number of cases that cannot be detected. My guess is the feature is not implemented because it is expensive to do so and is only useful in a subset of scenarios
JaredPar
As an example to the above, imagine a simple case of "a[b]=1;" - array bound checking would have to be done an runtime and this would cost additional CPU cycles for every (or most) array operations.
jcinacio
how does sizeof(a) works?
Kazoom
@Kazoom, the compiler knows that the length of a is 10 and the unitary size of an int is 4 (for example), so it simply uses the value 40.
paxdiablo
@Kazoom: Yes, in this simple case, the compiler *could* detect the problem and throw an error at you. But that's not required by the standard (C is supposed to be easy to implement, after all - no fancy features).
jalf
@Jared, it's only dangerous to those that don't know how to use it properly. I've always likened languages to power tools. If you don't know how to use a chainsaw, you've got no business using it. If you cut your leg off, it's your own fault :-)
paxdiablo
@Pax, definitely agree. But C/C++ is much more like a slow acting poison than a chainsaw. Sawing off your leg produces an immediate and noticable effect. Poisons though come in great variety and can be slow to fast acting and have varying symptoms.
JaredPar
The *real* problem is that C and C++ _implementations_ typically do not check bounds (neither at compile nor at runtime). They're fully allowed to do so. Don't blame the language for that.
MSalters
+3  A: 

You generally only get a segmentation fault if you try to access memory your process doesn't own.

What you're seeing in the case of a[11] (and a[10] by the way) is memory that your process does own but doesn't belong to the a[] array. a[25000] is so far from a[], it's probably outside your memory altogether.

Changing a[11] is far more insidious as it silently affects a different variable (or the stack frame which may cause a different segmentation fault when your function returns).

paxdiablo
+2  A: 

C isn't doing this. The OS's virtual memeory subsystem is.

In the case where you are only slightly out-of-bound you are addressing memeory that is allocated for your program (on the stack call stack in this case). In the case where you are far out-of-bounds you are addressing memory not given over to your program and the OS is throwing a segmentation fault.

On some systems there is also a OS enforced concept of "writeable" memory, and you might be trying to write to memeory that you own but is marked unwriteable.

dmckee
A: 

That's not a C issue its an operating system issue. You're program has been granted a certain memory space and anything you do inside of that is fine. The segmentation fault only happens when you access memory outside of your process space.

Not all operating systems have seperate address spaces for each proces, in which case you can corrupt the state of another process or of the operating system with no warning.

zimbu668
+11  A: 

The segfault is not an intended action of your C program that would tell you that an index is out of bounds. Rather, it is an unintended consequence of undefined behavior.

In C and C++, if you declare an array like

type name[size];

You are only allowed to access elements with indexes from 0 up to size-1. Anything outside of that range causes undefined behavior. If the index was near the range, most probably you read your own program's memory. If the index was largely out of range, most probably your program will be killed by the operation system. But you can't know, anything can happen.

Why does C allow that? Well, the basic gist of C and C++ is to not provide features if they cost performance. C and C++ has been used for ages for highly performance critical systems. C has been used as a implementation language for kernels and programs where access out of array bounds can be useful to get fast access to objects that lie adjacent in memory. Having the compiler forbid this would be for naught.

Why doesn't it warn about that? Well, you can put warning levels high and hope for the compiler's mercy. This is called quality of implementation (QoI). If some compiler uses open behavior (like, undefined behavior) to do something good, it has a good quality of implementation in that regard.

[js@HOST2 cpp]$ gcc -Wall -O2 main.c
main.c: In function 'main':
main.c:3: warning: array subscript is above array bounds
[js@HOST2 cpp]$

If it instead would format your hard disk upon seeing the array accessed out of bounds - which would be legal for it - the quality of implementation would be rather bad. I enjoyed to read about that stuff in the ANSI C Rationale document.

Johannes Schaub - litb
i've deleted my own post, you was earler and provided most expanded answer:)
bb
The best explanation of undefined behavior was "the world might end and monkeys might fly out of your derrière" (derrière was replaced with a somewhat more vulgar word). In fact, this is also quite legal for implementation-defined behavior as long as the docs say it will happen :-)
paxdiablo
+2  A: 

Just to add what other people are saying, you cannot rely on the program simply crashing in these cases, there is no gurantee of what will happen if you attempt to access a memory location beyond the "bounds of the array." It's just the same as if you did something like:

int *p;
p = 135;

*p = 14;

That is just random; this might work. It might not. Don't do it. Code to prevent these sorts of problems.

BobbyShaftoe
Not the same. Dereferencing an uninitialized pointer should be assumed to be a random pointer. Accessing one item past the end of an array is far more likely to not crash because systems typically allocate a full page of memory (4KB or more) at a time, leaving some space after the end of the array.
Andrew Medico
Is the same. C gives you no such gurantee. If one system works that way then that's fine but so what? Also, I think you should reread what I wrote as you completely missed the point. I don't know why you responded with this, I am perplexed.
BobbyShaftoe
+1  A: 

As litb mentioned, some compilers can detect some out-of-bounds array accesses at compile time. But bounds checking at compile time won't catch everything:

int a[10];
int i = some_complicated_function();
printf("%d\n", a[i]);

To detect this, runtime checks would have to be used, and they're avoided in C because of their performance impact. Even with knowledge of a's array size at compile time, i.e. sizeof(a), it can't protect against that without inserting a runtime check.

Tung Nguyen
+1  A: 

As I understand the question and comments, you understand why bad things can happen when you access memory out of bounds, but you're wondering why your particular compiler didn't warn you.

Compilers are allowed to warn you, and many do at the highest warning levels. However the standard is written to allow people to run compilers for all sorts of devices, and compilers with all sorts of features so the standard requires the least it can while guaranteeing people can do useful work.

There are a few times the standard requires that a certain coding style will generate a diagnostic. There are several other times where the standard does not require a diagnostic. Even when a diagnostic is required I'm not aware of any place where the standard says what the exact wording should be.

But you're not completely out in the cold here. If your compiler doesn't warn you, Lint may. Additionally, there are a number of tools to detect such problems (at run time) for arrays on the heap, one of the more famous being Electric Fence (or DUMA). But even Electric Fence doesn't guarantee it will catch all overrun errors.

Max Lybbert