views:

409

answers:

5

So, as I learned from Michael Burr's comments to this answer, the C standard doesn't support integer subtraction from pointers past the first element in an array (which I suppose includes any allocated memory).

From section 6.5.6 of the combined C99 + TC1 + TC2 (pdf):

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

I love pointer arithmetic, but this has never been something I've worried about before. I've always assumed that given:

 int a[1];
 int * b = a - 3;
 int * c = b + 3;

That c == a.

So while I believe I've done that sort of thing before, and not gotten bitten, it must have been due to the kindness of the various compilers I've worked with - that they've gone above and beyond what the standards require to make pointer arithmetic work the way I thought it did.

So my question is, how common is that? Are there commonly used compilers that don't do that kindness for me? Is proper pointer arithmetic beyond the bounds of an array a defacto standard?

A: 

Arbitrary pointer arithmetic works well on all common platforms. Going past array bounds often results in bugs other than the arithmetic itself not working.

Lars Wirzenius
Oh, I agree that dereferencing past the array bounds can cause some nasty errors.
rampion
What is interesting about the x86 segmented memory is that just loading an invalid pointer into registers could cause a crash. That is the actual origin of the error message "segmentation violation" common on x86 platforms where others would usually say "access violation".
RBerteig
x86 segmented memory is, luckily, a thing of the past, as far as common platforms are concerned (with the exception of people writing operating system bootstrapping code). As far as I know, "Segmentation violation" comes originally from Multics, on which Unix's principal fathers worked on before they staretd Unix.
Lars Wirzenius
As they used to say "Unix is Multics with its balls cut off"... A little googling around seems to indicate that my memory is faulty on the history there. It does look like SIGSEGV and the phrase Segmentation Violation predate the existance of the 8086 processor. My excuse is that I'm paging in memory that hasn't been referenced in a long time and my long term storage is suffering from bit rot ;-)
RBerteig
You didn't answer the question (Which compilers have problems?) or give the safe advice (Don't do this)
Matthew Flaschen
+4  A: 

This is not "implementation defined" by the Standard, this is "undefined" by the Standard. Which means that you can't count on a compiler supporting it, you can't say, "well, this code is safe on compiler X". By invoking undefined behavior, your program is undefined.

The practical answer isn't "how (where, when, on what compiler) can I get away with this"; the practical answer is "don't do this".

tpdi
I think the OP is wondering about *why* this is true, as much as anything. If you've never experienced the "joy" of developing an application for Windows 3.0 it is understandable that you might not appreciate how easy we have it today ;-)
RBerteig
In fact I did program for Windows 3.0. Back then, the File Manager only allowed a file type to be associated with one program. I wrote a handler that allowed the user to add multiple programs per file type; the user then associated files with that program, which on right click allowed the user to chose from his customs list of programs for that file type.
tpdi
I just remember having to drop back to DOS to compile (and run a decent editor) because the MSC compiler couldn't be run in a DOS box reliably. Besides, after any bug at all, the chances were that exiting Windows required a three-finger salute and the DOS prompt was the first stop after that... The real joy was designing dialog box layouts on paper and in a text editor and having to sit through a compile to see what they looked like...
RBerteig
Incidentally tpdi, I didn't mean my comment to be dismissive of you in particular. Writing about the segment registers just brought up some latent tendency towards feeling like a cranky old geezer trotting out the "walked 5 miles uphill both ways in the snow" stories tonight. ;-)
RBerteig
Not taken that way at all, and sorry if my response seemed too "oh yes I did!"; in fact I'd literally forgotten I ever wrote that thing, until your comment prompted me to recall it. Even now, I can't recall /when/ I wrote it. Presumably before '95? I know I learned 8086 asm 9to write silly TSRs) long before I learned C. So my comment was more out of self-surprise than anything else. And yeah, I remember worring about segmented memory and offsets. Things /were/ more complicated then.
tpdi
IIRC, Win95 actually shipped in 1995 because they accidentally stuck to their schedule. I remember playing with the early betas of NT and running NT 3.1 on my home PC before '95 shipped. It was so liberating to get back to a flat 32-bit address space, a self-hosted compiler, and applications that couldn't crash each other or the kernel. So I naturally started writing NT drivers for the sense of danger....
RBerteig
+6  A: 

MSDOS FAR pointers had problems like this, which were usually covered over by "clever" use of the overlap of the segment register with the offset register in real-mode. The effect there was that the 16-bit segment was a shifted left 4 bits, and added to the 16-bit offset which gave a 20-bit physical address that could address 1MB, which was plenty because everyone knew that noone would ever need as much as 640KB of RAM. ;-)

In protected mode, the segment register was actually an index into a table of memory descriptors. A typical DOS extending runtime would usually arrange things so that many segments could be treated just like they would have been in real mode, which made porting code from real mode easy. But it had some defects. Primarily, the segment before an allocation was not part of the allocation, and so its descriptor might not even be valid.

On the 80286 in protected mode, just loading a segment register with a value that would cause an invalid descriptor to load would cause an exception, whether or not the descriptor was actually used to refer to memory.

A similar issue potentially occurs at one byte past the allocation. The last ++ on the pointer might have carried over to the segment register, causing it to load a new descriptor. In this case, it is reasonable to expect that the memory allocator could arrange for one safe descriptor past the end of the allocated range, but it would be unreasonable to expect it to arrange for any more than that.

RBerteig
http://www.faktoider.nu/640kb_eng.html :)
unwind
Thanks for that... you will notice that I didn't quote the phrase or blame Bill for it. It may not really have been said by anyone in seriousness, but the attitude was not unfamiliar to anyone who had to *pay* for a large block of memory....
RBerteig
A: 

ZETA-C for the TI Explorer; pointers are implemented as arrays and indexes or displaced arrays, IIRC, so your example probably wouldn't work. Start from zcprim>pointer-subtract in zcprim.lisp to figure out what the behavior would be. No idea whether this was correct per the standard, but I get the impression that it was.

Julian Squires
+1  A: 

Another reason is that there are optional conservative garbage collectors (like the boehm-weiser GC) that assume a pointer is always inside the allocated range and if not they are allowed to free the memory at any time.

There is one popular commercial quality and used library that does break this assumption and it is the Judy Trees Library from HP which uses pointer algorithms to implement a very complex hash structure.

Lothar