views:

334

answers:

7

I was helping a friend with some C++ homework. I warned said friend that the kind of programming I do (PHP, Perl, Python) is pretty different from C++, and there were no guarantees I wouldn't tell horrible lies.

I was able to answer his questions, but not without stumbling over my own dynamic background. While I was reacquainting myself with C++ array semantics, I did something stupid like this (simplified example to make my question clearer)

 #include <iostream>
 #include <cstring>
 using namespace std;
 int main()
 {
   char easy_as_one_two_three[] = {'A','B','C'};  
   int an_int = 1;

   //I want an array that has a length of the value 
   //that's currently in an_int (1)
   //This clearly (to a c++ programmer) doesn't do that.
   //but what is it doing?
   char breaking_things[an_int];

   cout << easy_as_one_two_three << endl;
   return 1;
 }

When I compile and run this program, it produces the following output

 ABC????

However, if I comment out my bogus array declaration

 #include <iostream>
 #include <cstring>
 using namespace std;
 int main()
 {
   char easy_as_one_two_three[] = {'A','B','C'};  
   int an_int = 1;

   //I want an array that has a length of the value 
   //that's currently in an_int (1)
   //This clearly (to a c programmer) doesn't do that.
   //but what is it doing?
   //char breaking_things[an_int];

   cout << easy_as_one_two_three << endl;
   return 1;
 }

I get the output I expect:

 ABC

So, what exactly is happening here? I understand (vaguely) that when you create an array, you're pointing to a specific memory address, and when you give an array a length, you're telling the computer "reserve the next X blocks for me".

What I don't understand is, when I use a variable in an array declaration, what am I telling the computer to do, and why does it have an effect on a completely separate array?

Compiler is g++, version string is

 science:c++ alanstorm$ g++ -v
 Using built-in specs.
 Target: i686-apple-darwin9
 Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
 Thread model: posix
 gcc version 4.0.1 (Apple Inc. build 5493)
+11  A: 

Update: Neil pointed out in his comment to the question that you will get error if you compile this with -Wall and -pedantic flags in g++.

error: ISO C++ forbids variable-size array

You are getting ABC???? because it prints the contents of the array (ABC) and continues to print until it encounters a \0.

Had the array been {'A','B','C', '\0'};, the output will be just ABC as expected.

Variable-length arrays were introduced in C99 - this doesn't seem to apply to C++ though.


It is undefined behavior. Even if you comment out the bogus declaration, the printed output is not always what you expect (ABC). Try giving ASCII values of some printable character (something between 32 and 126) to an_int instead of 1 and you will see the difference.

an_int            output
------------------------
 40                ABC(
 65                ABCA
 66                ABCB
 67                ABCC
 296               ABC(
 552               ABC(
 1064              ABC(
 1024*1024 + 40    ABC(

See the pattern here? Apparently it interprets the last byte (LSB) of the an_int as a char, prints it, somehow finds a null char afterwards and stops printing. I think the "somehow" has to do something with the MSB portion of an_int being filled with zeros, but I'm not sure (and couldn't get any results to support this argument either).

UPDATE: It is about the MSB being filled zeros. I got the following results.

ABC( for 40 - (3 zero bytes and a 40),
ABC(( for 10280 (which is (40 << 8) + 40) - (2 zero bytes and two 40s),
ABC((( for 2631720 (which is (10280 << 8) + 40) - (1 zero byte and three 40s),
ABC((((°¿® for 673720360 (which is (2631720 << 8) + 40) - no zero bytes and hence prints random chars until a zero byte is found.
ABCDCBA0á´¿á´¿® for (((((65 << 8) + 66) << 8) + 67) << 8) + 68;

These results were obtained on a little endian processor with 8-bit atomic element size and 1-byte address increment, where 32 bit integer 40 (0x28 in hex) is represented as 0x28-0x00-0x00-0x00 (LSB at the lowest address). Results might vary from compiler to compiler and platform to platform.

Now if you try uncommenting the bogus declaration, you will find that all the outputs are of the form ABC-randomchars-char_corresponding_to_an_int. This again is the result of undefined behavior.

Amarghosh
Amarghosh, see my updated post. When I leave out the "char breaking_things[an_int];" I get the output results I expected. Why does that version of my program with the commented line stop without the null, but the previous version does not?
Alan Storm
Your program produces *undefined behavior*. There's no real answer to your "why" question. For all means and purposes the behavior of your program can be seen as *random*.
AndreyT
I tested it and reproduced the behavior - don't know why that's happening though. Will modify the post to reflect that variable arrays aren't applicable to C++
Amarghosh
@AndreyT, your use of dick quotes aside, I certainly believe that invalid syntax like this would cause (essentially) random behavior. What I don't understand is why the **c++** compiler is letting this invalid syntax through.
Alan Storm
My guess is that the stack memory after your array is somehow allocated with 0's, when you allocate your second array it's placed next to your first array in memory making cout continue to the next 0 it finds. As has been said this is undefined behaviour though, that doesn't mean it's totally random.
Runeborg
That's fascinating, and exactly the kind of description I was after. Thank you!
Alan Storm
A: 

Output is like this since it will print the content of the char array until it finds a null character .

Make sure that char array must be null terminated string and specify the size of the array --> total chars + 1 (for null char) .

Ashish
See my updated question. When I remove the (apparently not valid in c++ but slipped through the compiler somehow) problematic array declaration, I get the output I expect.
Alan Storm
Just because you got the output you expected does not mean your code is correct. Sometimes, you correctly guess what "undefined behaviour" will do, either because you know a bit about your compiler's internal workings, or pure luck. That doesn't make it right.
Steve Jessop
+2  A: 

char breaking_things[an_int] is allocating char array of size an_int (in your case 1), It's called variable length array and it's a relatively new feature.

In case like this it's more common to dynamically allocate memory using new:

char* breaking_things = new char[an_int]; // C++ way, C programmer would use malloc
Nikola Smiljanić
It is a "relatively new feature" of C language. There's no such thing as VLA in C++. And the question is about C++.
AndreyT
You are right but some C++ compilers support it, and he made a reference to C programmer at one point.
Nikola Smiljanić
+7  A: 

That will not "reacquaint" you "with c++ array semantics" since in C++ it is simply illegal. In C++ arrays can only be declared with sizes defined by Integral Constant Expressions (ICE). In your example the size is not an ICE. It only compiles because of GCC-specific extension.

From the C point of view, this is actually perfectly legal in C99 version of the language. And it does produce a so-called Variable Length Array of length 1. So your "clearly" comment is incorrect.

AndreyT
I'm not quite sure I'm following that. You're saying it's illegal C++ syntax (which I believe). If that's that case, why does g++ compile the program instead of yelling at me for doing something wrong? You mentioned a "GCC-specific extension"? What extension is that? Does GCC think the syntax is OK because in the C programming language (C99 anyway) it is?
Alan Storm
Variable length arrays are a C99 feature. They don't work in C++, except with gcc, which violates the standard by allowing them. (Yes, it's a bit confusing, I suppose.)
Charles Salvia
@Alan: Array objects with run-time size were supported by GCC even before C99. They were supported as C++ and C language extensions. One might even say that to a large degree the idea of VLA (as well as some other additions to the language) came to C99 from GCC. If you try compiling your code with g++ in `-ansi -pedantic -Wall` mode, the compiler should tell you that you are relying on a non-standard compiler-specific extension.
AndreyT
A: 

It's probably not breaking_things that broke things. The first array is not a NUL (\0) terminated string, which explains the output - cout will print whatever comes after ABC up until the first NUL it encounters.

As for the size of breaking_things, I would suspect it differs between compilers. I believe at least earlier versions of gcc used whatever value the variable happened to have at compile time, which can be tricky to determine.

See my updated question. When I remove the (apparently not valid in c++ but slipped through the compiler somehow) problematic array declaration, I get the output I expect.
Alan Storm
+3  A: 

It isn't invalid syntax. It's syntactically just fine.

It's semantically invalid C++, and rejected by my compiler (VC++). g++ seems to have an extension that allow the use of C99 VLAs in C++.

The reason for the question marks is that your array of three characters is not null terminated; it's printing until it finds a null on the stack. The layout of the stack is influenced by the variables declared on the stack. With the array, the layout is such that there's garbage prior to the first null; without the array there isn't. That is all.

DrPizza
+3  A: 

You get the output that you expect or don't expect by dumb luck. Because you didn't null terminate the characters in your array, when you go to print it out to cout it'll print the A, the B, and the C, and whatever else it finds until it hits a NULL character. With the array declaration, there's probably something that the compiler is pushing onto the stack to make the array sized at runtime that's leaving you with garbage characters after the A, B, and C whereas when you don't there just happens to be a 0 after the C on the stack.

Again, it's just dumb luck. To always get what you expect you should do: char easy_as_one_two_three[] = { 'A','B','C','\0'}; or, probably more usefully char easy_as_one_two_three[] = "ABC";, which will properly null terminate the string.

D Garcia