views:

145

answers:

5

In C or C++, there is no checking of arrays for out of bounds. One way to work around this is to package it with a struct:

struct array_of_foo{
  int length;
  foo *arr; //array with variable length.
};

Then, it can be initialized:

array_of_foo *ar(int length){
  array_of_foo *out = (array_of_foo*) malloc(sizeof(array_of_foo));
  out->arr = (foo*) malloc(length*sizeof(foo));
}

And then accessed:

foo I(array_of_foo *ar, int ix){ //may need to be foo* I(...
   if(ix>ar->length-1){printf("out of range!\n")} //error
   return ar->arr[ix];
}

And finally freed:

void freeFoo(array_of_foo *ar){ //is it nessessary to free both ar->arr and ar?
  free(ar->arr); free(ar);
}

This way it can warn programmers about out of bounds. But will this packaging slow down the preformance substantially?

A: 

I agree on the std::vector recommendation. Additionally you might try boost::array libraries, which include a complete (and tested) implementation of fixed sized array containers:

http://svn.boost.org/svn/boost/trunk/boost/array.hpp

sukru
Heads-up: The question was retagged from [c++] to [c].
James McNellis
@James I'm a little unclear why though; the question mentions C++ several times
Michael Mrozek
@James: Yes, but that wasn't done by Kevin. (Although I agree that there's nothing C++ about that code.)
sbi
@sbi, Michael: Oh, sorry; I didn't see that someone else retagged the question. I meant this more as a "heads-up" so that people didn't start downvoting it because it didn't match the (current) question tags.
James McNellis
@James @sbi The question was retagged by an idiot, apparently - I've rolled it back.
anon
A: 

In C++, there's no need to come up with your own incomplete version of vector. (To get bounds checking on vector, use .at() instead of []. It'll throw an exception if you get out of bounds.)

In C, this isn't necessarily a bad idea, but I'd drop the pointer in your initialization function, and just return the struct. It's got an int and a pointer, and won't be very big, typically no more than twice the size of a pointer. You probably don't want to have random printfs in your access functions anyway, as if you do go out of bounds you'll get random messages that won't be very helpful even if you look for them.

David Thornley
A: 

Most likely the major performance hit will come from checking the index for every access, thus breaking pipelining in the processor, rather than the extra indirection. It seems to me unlikely that an optimizer would find a way to optimize away the check when it's definitely not necessary.
For example, this will be very noticed in long loops traversing the entire array - which is a relatively common pattern.

And just for the sake of it:
- You should initialize the length field too in ar()
- You should check for ix < 0 in I()

adamk
`static inline int check(int i, int *l) { return i > *l; } int main() { int a = 5; for (int i = 0; i < 5; ++i) { if (check(i, }`. gcc with -O3 optimizes away the entire loop, so there is some hope for the optimizer to remove bounds checks in some cases. Whether a C compiler will replace multiple bounds checks with a single check, like JITs do, I don't know.
Steve Jessop
A good compiler should be able to optimize common cases if the program is clear enough (the compiler must be able to keep track of all the aliases of the variables holding the length and the index). As for the processor pipelining, I'd want to see some benchmarks; processors also have optimistic branch execution, precisely to make better use of pipelining.
Gilles
gcc doesn't do the loop hoist I was hoping for. `check` as before, then `int main(int argc, char **argv) { int a = atoi(argv[1]); int b = 1; for (int i = 0; i < 7; ++i) { if (check(i, } printf("%d\n", b);}`. The resulting code calls atoi, then compares the result against 0 .. 5 in turn. With the loop unrolled.
Steve Jessop
In long loops traversing the entire array, the branch should be perfectly predicted.
caf
@Steve, @caf: As each test is done against an arbitrary memory location (and not a constant, as in your example), which could be set with an unpredictable value (e.g. result of a system call), loop unrolling becomes irrelevant and optimization is not trivial.
adamk
@adamk: that's kind of the point of my second test, which is against memory location containing a value not predictable at compile time. I think caf's probably right about the branch prediction, though: in a well-written program the bounds will never be exceeded, but even if they are it will generally only be once. So the branch will go the same way every time (or almost every time) it's tested. AFAIK (which isn't that far), this should be an ideal case for branch prediction on CPUs which do that. On CPUs which don't but which allow manual prediction, check the assembly...
Steve Jessop
... like Gilles says, though, one bit of non-inlined code or a potential aliased pointer, and suddenly the compiler doesn't know whether the value in the memory location will change each time round the loop. This rules out the hoist (not that gcc did it anyway in my test), but not the branch prediction.
Steve Jessop
A: 

I don't have any formal studies to cite, but echoes I've had from languages where array bound checking is optional is that turning it off rarely speeds up a program down perceptibly.

If you have C code that you'd like to make safer, you may be interested in Cyclone.

Gilles
"turning it *on*" perhaps?
dmckee
@dmckee: thanks. Safety first though!
Gilles
A: 

You can test it yourself, but on certain machines you may have serious performance issues under different scenarios. If you are looping over millions of elements, then checking the bounds every time will lead to numerous cache misses. How much of an impact that will have depends on what your code is doing. Again, you could test this pretty quickly.

Jess