views:

374

answers:

6

I read this question and its answer in a book. But I didn't understand the book's justification.

Have a look:

Question:

Will the following code compile?

int main()
{
   char str[5] = "fast enough";
   return 0;
}

And the answer was:

Yes.The compiler never detects the error if bounds of an array are exceeded.

I couldn't get it.

Can anybody please explain this?

+5  A: 
Mark Rushakoff
"Will it compile" is generally used to mean "will it compile without errors", not "will it compile without warnings".
Brian Schroth
"The compiler never detects the error if bounds of an array are exceeded." In this case it does.
Mark Rushakoff
-1: This does not even start to help answer the question.
Kyle Rozendo
@Brian: that's why there's `-Werror`
Christoph
It simply does not require a compiler error according to the language specification. The compiler can warn about it, and even treat it as an error if you ask it to treat warnings as errors (-Werror). In case of array initializations it should be pretty trivial for the compiler to check the number of initializers (and it has to do it anyway, because it has to fill the rest of the array with zeros, should there be too *few* initializers). Basically C is a thin abstraction over assembly, and it is supposed to allow you to make all the "mistakes" you could do at a lower level...
UncleBens
Yes it does; this shows that some compilers *do* detect that the constant string was too long, and they truncate the string.
Mark Rushakoff
As to what actually happens, I tried it out. It may happen that there is a null after the array. But when you declare two such arrays that happen to be contiguous in memory, there will be no null between them.
UncleBens
@Brian Schroth: Teh separation between errors and warnings is not defined by the language. The language knows *diagnostic messages* only. The rest is up to the compiler.
AndreyT
@UncleBens: Wrong. It does require a compile error according to the language specification. The language explicitly prohibits supplying more initializers than theres' objects to be initialized. The language only allows dropping the terminating `0` from the string literal. Dropping anything else is explicitly prohibited. The code in the OP must generate an diagnostic message, because it is a constraint violation.
AndreyT
Look at sizeof(str); that will tell you that it stops at 5.
Jonathan Leffler
As AndreyT says, every compiler is required to diagnose this problem. However, I am surprised that GCC only issues a warning; it seems to me that this should be treated as much as an error in C compiles as GCC treats it in C++ compiles (it is an error when GCC compiles as C++).
Michael Burr
A: 

Array-bound checking happens at runtime, not compile time. The compiler has no way of doing the static analysis of the above code that would be necessary to prevent the error.

UPDATE: Apparently the above statement is true for some compilers and not others. If your book says it will compile, it must be referring to a compiler that doesn't do the checking.

Dave Swersky
Sure it can - this is an initializer known to the compiler at compile time. It's just not required to error out.
bdonlan
Well, in C, array bounds checking doesn't happen at all. And obviously the compiler can do the static analysis, since gcc would give you a warning (as pointed out by another poster)
Eric Petroelje
C doesn't do array bounds checking, hence this compiles. The compiler could do static checking but it's still valid C.
Timo Geusch
A: 

Because "fast enough" simply a pointer to a null terminated string. It's too much work for the compiler to figure out if ever assignment to a char* or char [] is going to go beyond the bounds of the array.

theycallmemorty
This can't be the reason. The compiler has to check the number of initializers anyway to allow things like `char s[] = "Hello"` and `int array[10] = {2}`
UncleBens
There is a difference between "Hello" and {'H','e','l','l','o','\0'}One is an array and one is simply a pointer. The pointer could point to anywhere but the array is constant at compile time.
theycallmemorty
@theycallmemorty: Huh? Which one is a pointer?
AndreyT
A: 

What's happening is you're trying to initialize a character array with more characters than the array has room for. Here's how it breaks down:

char str[5];

Declares a character array with five characters.

char str[5] = "fast enough";

The second part '= "fast enough";' then attempts to initialize that array with the value "fast enough". This will not work, because "fast enough" is longer than the array is.

It will, however, compile. C and C++ compilers can't generally perform bounds checking on arrays for you, and overrunning an array is one of the most common reasons for segmentation faults. [edit]As Mark Rushakoff pointed out, apparently the newer ones do throw warnings, for some cases.[/edit] This may segfault when you try to run it, more likely I think the array will simply be initialized to "fast ".

Daniel Bingham
No, it won't "compile". it is a contraint violation, which requires a diagnostic message. Comeau, a very pedantic compiler, will refuse to compile this with an error message. GCC, on the other hand, opts for a mere warning. From the language point of view this is a constraint violation, i.e. in simple terms it is an *error*.
AndreyT
Harsh Andrey. Obviously it will compile - SOME OF THE TIME. It does in GCC, it does in the original posters compiler. It will in MANY compilers. And if he doesn't have -Wall turned on it won't even give a warning. Maybe it is a violation and maybe some compilers will catch it. But my answer is not wrong in saying that it will generally compile. And my assertion that in most cases compilers can't and don't perform bounds checking is also true. At initialization is one of the few cases where they can, and as mentioned in the edit, some do.
Daniel Bingham
@Alcon: No, actually, it seems to fail with an error in most (if not virtually *all*) compilers. In fact, GCC is the only exception from that rule that I know of so far. And no, you don't need `-Wall` in GCC to activate this warning. This warning is issued by default. Additionally, this has nothing to do with bounds checking. This is just a matter of supplying to many initializers in aggregate initialization. I'm sure all compilers without a single exception will issue an error if you do `int a[1] = { 1, 2 }`. The string literal is not really different from that.
AndreyT
Actually, even `int a[1] = { 1, 2 }` is just a warning in GCC. Even if you do that with a `struct` it is just a warning in GCC. There must be a reason for this, most likely a historical one (they might want to suport some crappy legacy code). But the fact that this is still a warning even in `-pedantic-errors` mode really means that this is a bug in GCC. GCC needs to fix it, the current behavior is hardly acceptable, especially in `-pedantic-errors` mode.
AndreyT
+2  A: 

The answer to the question that you quoted is incorrect. The correct answer is "No. The code will not compile", assuming a formally correct C compiler (as opposed to quirks of some specific compiler).

C language does not allow using an excessively long string literal to initialize a character array of specific size. The only flexibility allowed by the language here is the terminating \0 character. If the array is too short to accommodate the terminating \0, the terminating \0 is silently dropped. But the actual literal string characters cannot be dropped. If the literal is too long, it is a constraint violation and the compiler must issue a diagnostic message.

char s1[5] = "abc"; /* OK */
char s2[5] = "abcd"; /* OK */
char s3[5] = "abcde"; /* OK, zero at the end is dropped (ERROR in C++) */
char s4[5] = "abcdef"; /* ERROR, initializer is too long (ERROR in C++ as well) */

Whoever wrote your "book" did know what they were talking about (at least on this specific subject). What they state in the answer is flat out incorrect.

Note: Supplying excessively long string initializers is illegal in C89/90, C99 and C++. However C++ is even more restrictive in this regard. C++ prohibits dropping the terminating \0 character, while C allows dropping it, as described above.

AndreyT
Where is that constraint mentioned? I don't see it in the section on initializers.
John Bode
In C89/90 it in 6.5.7: Ther shall be no more initializers in an initializer than there are objects to be initialized. The exception for terminating 0 is given further in the text. I'm still looking for an equivalent in C99...
AndreyT
In C99 it is in the very same place: 6.7.8/2. No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
AndreyT
No, it *is* a constraint violation in C, both C89/90 and C99. I quited the relevant portions of both standards. The difference with C++ exists, but it's only with regard to terminating `0`. C++ does not allow dropping `0`.
AndreyT
Note that a diagnostic does not mean that it "will not compile" as you say - the compiler is free to give a warning instead (but the behavior of the resulting program is undefined)
bdonlan
@bdonlan: Well, that would mean that the very question of "Will it compile?" made no sense at all. And it doesn't, from the pedantic point of view. In practice though, it usually meant to mean "Is this code well-formed?" (borrowing C++ term) or "Are there any constraint violations in this code?". I'm sure that what was meant originally.
AndreyT
+4  A: 

In the C++ standard, 8.5.2/2 Character arrays says:

There shall not be more initializers than there are array elements.

In the C99 standard, 6.7.8/2 Initialization says:

No initializer shall attempt to provide a value for an object not contained within the entity being initialized

C90 6.5.7 Initializers says similar.

However, note that for C (both C90 and C99) the '\0' terminating character will be put in the array if there is room. It's not an error if the terminator will not fit (C99 6.7.8/14: "Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array").

On the other hand, the C++ standard has an example that indicates an error should be diagnosed if there's not room for the terminating character.

in either case, this should be diagnosed as an error in all compilers:

char str[5] = "fast enough";

Maybe pre-ANSI compilers weren't so strict, but any reasonably modern compiler should diagnose this.

Michael Burr
GCC 3.4.5 (compiling as C) only gives a warning (I think it should diagnose this as an error). GCC compiling as C++, MSVC 8, Digital Mars 8.5 and Comeau 4.3.10.1 all produce an error.
Michael Burr