views:

241

answers:

4
int main(void)
{
    char four[4] = "four";
    return 0;
}

When compiled as a C++ program, G++ reports

xxx.cpp: In function int main():

xxx.cpp:3: error: initializer-string for array of chars is too long

When compiled a a C program, GCC reports no error.

It appears to me, that the assignment is correctly copying all 4 bytes into the variable, as I expected.

So my question boils down to.....

Is the observed behavior in C correct or am I touching an undefined behavior somewhere, or is it something else altogether?

+21  A: 

Short answer: your code is valid C, but not valid C++.

Long Aswer:

"four" is actually 5 characters long - there is a \0 added there for you. In section 6.7.8 Initialization, paragraph 13, the C standard says:

An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

So the \0 is just ignored in your program when it is compiled as C. C++ is treating it differently. In fact, this particular case is called out explicitly in the C++ spec (Section 8.5.2 Character arrays, paragraph 2):

There shall not be more initializers than there are array elements. [ Example:

char cv[4] = "asdf";  // error

is ill-formed since there is no space for the implied trailing ’\0’. — end example ]

Carl Norum
In C this is valid, but it should give you a warning at some level.
Joel
@Joel, I don't think there should be a warning, the standard seems to indicate that it's completely safe and well-defined.
Carl Norum
While sizeof("four") is 5 bytes, only 4 bytes get copied to the variable.
EvilTeach
@Carl, can you post the section you are referring to as an answer?
EvilTeach
@EvilTeach, sure thing.
Carl Norum
@Carl: It's perfectly legal, but it's frequently an error and can lead to problems (like `strlen(four)`). The Standard isn't in the business of deciding what's completely safe, just well-defined. I'd like to see a warning.
David Thornley
@David, I agree on that front; I should have rephrased my comment. "I wouldn't be surprised if there were not a warning." A quick check with `-Wall` shows that there is in fact not a warning, at least from my version of gcc.
Carl Norum
It certainly is an eye opening surprise to encounter it when upgrading some c source to c++.
EvilTeach
@EvilTeach, Well I certainly wouldn't call that an upgrade.
Carl Norum
Ya. I think as a rewrite char four[4] = {'f', 'o', 'u', 'r'} would be the most reasonable thing. There is no question of intent that way.
EvilTeach
@EvilTeach, great solution.
Carl Norum
+2  A: 

The string "four" actually contains five bytes: the four letters plus a zero byte (\0) as a string terminator. It's been a while since I've written C or C++, but I would guess the C compiler is silently ignoring it for whatever reason.

fizban
+2  A: 

Better would be

char four[] = "four";
Jeff Walker
Which gives a five-char array in both C and C++, and works great.
David Thornley
@David, only if you *want* a five-character array. But if you don't care, this way is certainly more maintainable.
Carl Norum
Right, I would contend that you would almost never want char four[4] = "four".
Jeff Walker
@Jeff, I've seen things like that pretty commonly. For example, if you're dealing with filesystem structures or executable formats, there are often ASCII markers at various places in the file. The structure you use to match against the on-disk data has to have the same layout, so those ASCII markers might need non-null-terminated arrays to make everything make sense. In the past, people used multicharacter literals like `'four'` to handle these situations, but the compiler warns about that these days - using an array seems like a suitable substitute.
Carl Norum
+1  A: 

What you're seeing is a difference between C and C++. C allows you to have extra initializers, which are ignored. C++ prohibits this -- if you specify a size for a string (or array) it must be large enough to accommodate all the initializers (including the NUL terminator, in the case of a string), or the code is ill-formed (standardese for "it's not allowed -- expect the compiler to reject it").

Jerry Coffin
No. I think the extra NUL is treated as a special case. If you make itchar four[4] = "fiveX"; You get an error in C.
EvilTeach
@EvilTeach - I get a warning, not an error, with the `"fiveX"` case.
Carl Norum
@EvilTeach (and Carl): This (largely) comes back to one difficulty with the standards: they require a "diagnostic" for incorrect code (but the compiler can still accept the code if if so chooses), but it's up to the compiler to define what is (or isn't) a diagnostic. It's also typical that specific flags are needed for conformance, so you may not get even that much by default.
Jerry Coffin
@Jerry. It seems to work on my platforms without any specific flag.Can you cite me an example, were the code doesn't work, or the code doesn't work unless a specific flag is set?
EvilTeach
@EvilTeach: it's not so much that the code won't work without specific flags, as that without the right flags, many compilers will (for example) allow extensions that *should* really be flagged as errors.
Jerry Coffin