ansaurus

Question

Why doesn't the compiler detect out-of-bounds in string constant initialization?

Answer 1

+5 A:

Mark Rushakoff 2009-11-04 17:45:37

"Will it compile" is generally used to mean "will it compile without errors", not "will it compile without warnings".

Brian Schroth 2009-11-04 17:46:52

"The compiler never detects the error if bounds of an array are exceeded." In this case it does.

Mark Rushakoff 2009-11-04 17:47:45

-1: This does not even start to help answer the question.

Kyle Rozendo 2009-11-04 17:52:02

@Brian: that's why there's `-Werror`

Christoph 2009-11-04 17:57:49

It simply does not require a compiler error according to the language specification. The compiler can warn about it, and even treat it as an error if you ask it to treat warnings as errors (-Werror). In case of array initializations it should be pretty trivial for the compiler to check the number of initializers (and it has to do it anyway, because it has to fill the rest of the array with zeros, should there be too *few* initializers). Basically C is a thin abstraction over assembly, and it is supposed to allow you to make all the "mistakes" you could do at a lower level...

UncleBens 2009-11-04 17:58:20

Yes it does; this shows that some compilers *do* detect that the constant string was too long, and they truncate the string.

Mark Rushakoff 2009-11-04 17:58:40

As to what actually happens, I tried it out. It may happen that there is a null after the array. But when you declare two such arrays that happen to be contiguous in memory, there will be no null between them.

UncleBens 2009-11-04 18:14:55

@Brian Schroth: Teh separation between errors and warnings is not defined by the language. The language knows *diagnostic messages* only. The rest is up to the compiler.

AndreyT 2009-11-04 20:00:51

@UncleBens: Wrong. It does require a compile error according to the language specification. The language explicitly prohibits supplying more initializers than theres' objects to be initialized. The language only allows dropping the terminating `0` from the string literal. Dropping anything else is explicitly prohibited. The code in the OP must generate an diagnostic message, because it is a constraint violation.

AndreyT 2009-11-04 20:07:37

Look at sizeof(str); that will tell you that it stops at 5.

Jonathan Leffler 2009-11-04 20:07:46

As AndreyT says, every compiler is required to diagnose this problem. However, I am surprised that GCC only issues a warning; it seems to me that this should be treated as much as an error in C compiles as GCC treats it in C++ compiles (it is an error when GCC compiles as C++).

Michael Burr 2009-11-04 20:57:56

Answer 2

A:

Array-bound checking happens at runtime, not compile time. The compiler has no way of doing the static analysis of the above code that would be necessary to prevent the error.

UPDATE: Apparently the above statement is true for some compilers and not others. If your book says it will compile, it must be referring to a compiler that doesn't do the checking.

Dave Swersky 2009-11-04 17:46:06

Sure it can - this is an initializer known to the compiler at compile time. It's just not required to error out.

bdonlan 2009-11-04 17:47:25

Well, in C, array bounds checking doesn't happen at all. And obviously the compiler can do the static analysis, since gcc would give you a warning (as pointed out by another poster)

Eric Petroelje 2009-11-04 17:47:53

C doesn't do array bounds checking, hence this compiles. The compiler could do static checking but it's still valid C.

Timo Geusch 2009-11-04 17:48:22

Answer 3

A:

Because "fast enough" simply a pointer to a null terminated string. It's too much work for the compiler to figure out if ever assignment to a char* or char [] is going to go beyond the bounds of the array.

theycallmemorty 2009-11-04 17:46:11

This can't be the reason. The compiler has to check the number of initializers anyway to allow things like `char s[] = "Hello"` and `int array[10] = {2}`

UncleBens 2009-11-04 18:02:20

There is a difference between "Hello" and {'H','e','l','l','o','\0'}One is an array and one is simply a pointer. The pointer could point to anywhere but the array is constant at compile time.

theycallmemorty 2009-11-04 19:25:42

@theycallmemorty: Huh? Which one is a pointer?

AndreyT 2009-11-04 22:24:48

Answer 4

A:

What's happening is you're trying to initialize a character array with more characters than the array has room for. Here's how it breaks down:

char str[5];

Declares a character array with five characters.

char str[5] = "fast enough";

The second part '= "fast enough";' then attempts to initialize that array with the value "fast enough". This will not work, because "fast enough" is longer than the array is.

It will, however, compile. C and C++ compilers can't generally perform bounds checking on arrays for you, and overrunning an array is one of the most common reasons for segmentation faults. [edit]As Mark Rushakoff pointed out, apparently the newer ones do throw warnings, for some cases.[/edit] This may segfault when you try to run it, more likely I think the array will simply be initialized to "fast ".

Daniel Bingham 2009-11-04 17:47:57

No, it won't "compile". it is a contraint violation, which requires a diagnostic message. Comeau, a very pedantic compiler, will refuse to compile this with an error message. GCC, on the other hand, opts for a mere warning. From the language point of view this is a constraint violation, i.e. in simple terms it is an *error*.

AndreyT 2009-11-04 20:03:00

Harsh Andrey. Obviously it will compile - SOME OF THE TIME. It does in GCC, it does in the original posters compiler. It will in MANY compilers. And if he doesn't have -Wall turned on it won't even give a warning. Maybe it is a violation and maybe some compilers will catch it. But my answer is not wrong in saying that it will generally compile. And my assertion that in most cases compilers can't and don't perform bounds checking is also true. At initialization is one of the few cases where they can, and as mentioned in the edit, some do.

Daniel Bingham 2009-11-04 20:29:18

@Alcon: No, actually, it seems to fail with an error in most (if not virtually *all*) compilers. In fact, GCC is the only exception from that rule that I know of so far. And no, you don't need `-Wall` in GCC to activate this warning. This warning is issued by default. Additionally, this has nothing to do with bounds checking. This is just a matter of supplying to many initializers in aggregate initialization. I'm sure all compilers without a single exception will issue an error if you do `int a[1] = { 1, 2 }`. The string literal is not really different from that.

AndreyT 2009-11-04 22:23:43

Actually, even `int a[1] = { 1, 2 }` is just a warning in GCC. Even if you do that with a `struct` it is just a warning in GCC. There must be a reason for this, most likely a historical one (they might want to suport some crappy legacy code). But the fact that this is still a warning even in `-pedantic-errors` mode really means that this is a bug in GCC. GCC needs to fix it, the current behavior is hardly acceptable, especially in `-pedantic-errors` mode.

AndreyT 2009-11-04 22:30:55

Answer 5

+2 A:

The answer to the question that you quoted is incorrect. The correct answer is "No. The code will not compile", assuming a formally correct C compiler (as opposed to quirks of some specific compiler).

C language does not allow using an excessively long string literal to initialize a character array of specific size. The only flexibility allowed by the language here is the terminating \0 character. If the array is too short to accommodate the terminating \0, the terminating \0 is silently dropped. But the actual literal string characters cannot be dropped. If the literal is too long, it is a constraint violation and the compiler must issue a diagnostic message.

char s1[5] = "abc"; /* OK */
char s2[5] = "abcd"; /* OK */
char s3[5] = "abcde"; /* OK, zero at the end is dropped (ERROR in C++) */
char s4[5] = "abcdef"; /* ERROR, initializer is too long (ERROR in C++ as well) */

Whoever wrote your "book" did know what they were talking about (at least on this specific subject). What they state in the answer is flat out incorrect.

Note: Supplying excessively long string initializers is illegal in C89/90, C99 and C++. However C++ is even more restrictive in this regard. C++ prohibits dropping the terminating \0 character, while C allows dropping it, as described above.

AndreyT 2009-11-04 19:59:34

Where is that constraint mentioned? I don't see it in the section on initializers.

John Bode 2009-11-04 20:05:33

In C89/90 it in 6.5.7: Ther shall be no more initializers in an initializer than there are objects to be initialized. The exception for terminating 0 is given further in the text. I'm still looking for an equivalent in C99...

AndreyT 2009-11-04 20:12:16

In C99 it is in the very same place: 6.7.8/2. No initializer shall attempt to provide a value for an object not contained within the entity being initialized.

AndreyT 2009-11-04 20:14:17

No, it *is* a constraint violation in C, both C89/90 and C99. I quited the relevant portions of both standards. The difference with C++ exists, but it's only with regard to terminating `0`. C++ does not allow dropping `0`.

AndreyT 2009-11-04 20:16:26

Note that a diagnostic does not mean that it "will not compile" as you say - the compiler is free to give a warning instead (but the behavior of the resulting program is undefined)

bdonlan 2009-11-04 22:32:03

@bdonlan: Well, that would mean that the very question of "Will it compile?" made no sense at all. And it doesn't, from the pedantic point of view. In practice though, it usually meant to mean "Is this code well-formed?" (borrowing C++ term) or "Are there any constraint violations in this code?". I'm sure that what was meant originally.

AndreyT 2009-11-04 22:53:26

Answer 6

+4 A:

In the C++ standard, 8.5.2/2 Character arrays says:

There shall not be more initializers than there are array elements.

In the C99 standard, 6.7.8/2 Initialization says:

No initializer shall attempt to provide a value for an object not contained within the entity being initialized

C90 6.5.7 Initializers says similar.

However, note that for C (both C90 and C99) the '\0' terminating character will be put in the array if there is room. It's not an error if the terminator will not fit (C99 6.7.8/14: "Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array").

On the other hand, the C++ standard has an example that indicates an error should be diagnosed if there's not room for the terminating character.

in either case, this should be diagnosed as an error in all compilers:

char str[5] = "fast enough";

Maybe pre-ANSI compilers weren't so strict, but any reasonably modern compiler should diagnose this.

Michael Burr 2009-11-04 20:16:03

GCC 3.4.5 (compiling as C) only gives a warning (I think it should diagnose this as an error). GCC compiling as C++, MSVC 8, Digital Mars 8.5 and Comeau 4.3.10.1 all produce an error.

Michael Burr 2009-11-04 21:04:52

ansaurus

tags:

views:

answers:

Why doesn't the compiler detect out-of-bounds in string constant initialization?

related questions