views:

871

answers:

8

Just read on an internal university thread:

#include <iostream>
using namespace std;

union zt
{
 bool b;
 int i;
};

int main()
{
 zt w;
 bool a,b;
 a=1;
 b=2;
 cerr<<(bool)2<<static_cast<bool>(2)<<endl;                      //11
  cerr<<a<<b<<(a==b)<<endl;                                      //111
 w.i=2;
 int q=w.b;
 cerr<<(bool)q<<q<<w.b<<((bool)((int)w.b))<<w.i<<(w.b==a)<<endl; //122220
 cerr<<((w.b==a)?'T':'F')<<endl;                                 //F
}

So a,b and w.b are all declared as bool. a is assigned 1, b is assigned 2, and the internal representation of w.b is changed to 2 (using a union).

This way all of a,b and w.b will be true, but a and w.b won't be equal, so this might mean that the universe is broken (true!=true)

I know this problem is more theoretical than practical (a sake programmer doesn't want to change the internal representation of a bool), but here are the questions:

  1. Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?
  2. Do you know any case where this corner case might become a real issue? (For example while loading binary data from a stream)

EDIT:

Three things:

  1. bool and int have different sizes, that's okay. But what if I use char instead of int. Or when sizeof(bool)==sizeof(int)?

  2. Please give answer to the two questions I asked if possible. I'm actually interested in answers to the second questions too, because in my honest opinion, in embedded systems (which might be 8bit systems) this might be a real problem (or not).

  3. New question: Is this really undefined behavior? If yes, why? If not, why? Aren't there any assumptions on the boolean comparison operators in the specs?

A: 

Hmm strange, I am getting different output from codepad:

11
111
122222
T

The code also seems right to me, maybe it's a compiler bug?
See here

the_drow
The code has undefined behaviour, so will bery likely work differently (if it works at all) on different platforms. This is not a "compiler bug" - before you ever use those words, think twice and then think twice again.
anon
Thanks :) It seemed so strange to me.I didn't saw the union. I thought it was a plain struct.
the_drow
+6  A: 

Normally, when assigning an arbitrary value to a bool the compiler will convert it for you:

int x = 5;
bool z = x; // automatic conversion here

The equivalent code generated by the compiler will look more like:

bool z = (x != 0) ? true : false;

However, the compiler will only do this conversion once. It would be unreasonable for it to assume that any nonzero bit pattern in a bool variable is equivalent to true, especially for doing logical operations like and. The resulting assembly code would be unwieldy.

Suffice to say that if you're using union data structures, you know what you're doing and you have the ability to confuse the compiler.

Greg Hewgill
SztupY
It is undefined behavior to access a member of a union other than the last one in which a value was stored.
KeithB
+15  A: 

If you read a member of a union that is a different member than the last member which was written then you get undefined behaviour. Writing an int member and then reading the union's bool member could cause anything to happen at any subsequent point in the program.

The only exception is where the unions is a union of structs and all the structs contain a common initial sequence, in which case the common sequence may be read.

Charles Bailey
Yes, but what if I have a struct with a bool in it, and I'm reading data into that struct from a file. This way I'm not using unions, but still get the "wrong" data into the bool variable.
SztupY
Then that's another question, or edit your current one and change the code
nos
Once again, undefined behaviour. You cannot read arbitrary bit patterns into C++ variables and expect them to "work".
anon
D.Shawley
+1  A: 

I believe what you're doing is called type punning: http://en.wikipedia.org/wiki/Type_punning

Gabe
+2  A: 

The boolean is one byte, and the integer is four bytes. When you assign 2 to the integer, the fourth byte has a value of 2, but the first byte has a value of 0. If you read the boolean out of the union, it's going to grab the first byte.

Edit: D'oh. As Oleg Zhylin points out, this only applies to a big-endian CPU. Thanks for the correction.

BipedalShark
I suppose most of the folks are on little-endianness machines. But +1 for pointing this out.
Oleg Zhylin
A: 

Just to write down my points of view:

  1. Is this okay?

    I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.

    In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.

  2. Any real-world case

    The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).

    But this might mean bad programming practice.

One more: in embedded systems this bug might be a bigger problem, than on a "normal" system, because the programmers usually do more "bit-magic" to get the job done.

SztupY
"Opinion" doesn't come into it. It is either a bug or undefined behavior (or something else). And the entire point in undefined behavior is that implementers are not required to document anything, nor is a single consistent behavior required. Requiring these things would have turned it into implementation-defined behavior. Undefined behavior means that you can not rely on the behavior, even when it seems to work as you expect.
jalf
It's never safe to assume that what you read from a file is correct, especially if you go around the type system by setting bits directly.
Matthew Crumley
As for embedded systems, I would suggest using fixed size data types **ALWAYS** and be very explicit about the byte order. If your compiler/implementation doesn't provide them for you, roll your own using a header and a bunch of preprocessor stuff. Take a look at http://predef.sourceforge.net/ for a good collection of preprocessor macros defined by various compilers.
D.Shawley
A: 

Addressing the questions posed, I think the behavior is ok and shouldn't be a problem in real world. As we don't have ^^ in C++ I would suggest !bool == !bool as a safe bool comparison technique.

This way every non-zero value in bool variable will be converted to zero and every zero is converted to some non-zero value, but most probably one and the same for any negation operation.

Oleg Zhylin
How did this get upvoted? And how would !bool == !bool solve the problem here, which is that he's asking the compiler to compare a zero byte converted to bool, with the bool value `true`, and expecting them to be equal.
jalf
+7  A: 
  1. Is this okay? (this was tested with g++ 4.3.3) I mean, should the compiler be aware that during boolean comparison any non-zero value might mean true?

Any integer value that is non zero (or pointer that is non NULL) represents true. But when comparing integers and bool the bool is converted to int before comparison.

  1. Do you know any case where this corner case might become a real issue? (For example while binary loading of data from a stream)

It is always a real issue.

  1. Is this okay?

    I don't know whether the specs specify anything about this. A compiler might always create a code like this: ((a!=0) && (b!=0)) || ((a==0) && (b==0)) when comparing two booleans, although this might decrease performance.

    In my opinion this is not a bug, but an undefined behaviour. Although I think that every implementor should tell the users how boolean comparisons are made in their implementation.

If we go by your last code sample both a and b are bool and set to true by assigning 1 and 2 respectfully (Noe the 1 and 2 disappear they are now just true).

So breaking down your expression:

a!=0      // true (a converted to 1 because of auto-type conversion)
b!=0      // true (b converted to 1 because of auto-type conversion)

((a!=0) && (b!=0)) => (true && true)  // true ( no conversion done)

a==0      // false (a converted to 1 because of auto-type conversion)
b==0      // false (b converted to 1 because of auto-type conversion)

((a==0) && (b==0)) => (false && false) // false ( no conversion done)

((a!=0) && (b!=0)) || ((a==0) && (b==0)) => (true || false) => true

So I would always expect the above expression to be well defined and always true.

But I am not sure how this applies to your original question. When assigning an integer to a bool the integer is converted to bool (as described several times). The actual representation of true is not defined by the standard and could be any bit pattern that fits in an bool (You may not assume any particular bit pattern).

When comparing the bool to int the bool is converted into an int first then compared.

  1. Any real-world case

    The only thing that pops in my mind, if someone reads binary data from a file into a struct, that have bool members. The problem might rise, if the file was made with an other program that has written 2 instead of 1 into the place of the bool (maybe because it was written in another programming language).

    But this might mean bad programming practice.

Writing data in a binary format is non portable without knowledge.
There are problems with the size of each object.
There are problems with representation:

  • Integers (have endianess)
  • Float (Representation undefined ((usually depends on the underlying hardware))
  • Bool (Binary representation is undefined by the standard)
  • Struct (Padding between members may differ)

With all these you need to know the underlying hardware and the compiler. Different compilers or different versions of the compiler or even a compiler with different optimization flags may have different behaviors for all the above.

The problem with Union

struct X
{
    int  a;
    bool b;
};

As people mention writing to 'a' and then reading from 'b' is undefined.
Why: because we do not know how 'a' or 'b' is represented on this hardware. Writing to 'a' will fill out the bits in 'a' but how does that reflect on the bits in 'b'. If your system used 1 byte bool and 4 byte int with lowest byte in low memory highest byte in the high memory then writing 1 to 'a' will put 1 in 'b'. But then how does your implementation represent a bool? Is true represented by 1 or 255? What happens if you put a 1 in 'b' and for all other uses of true it is using 255?

So unless you understand both your hardware and your compiler the behavior will be unexpected.

Thus these uses are undefined but not disallowed by the standard. The reason they are allowed is that you may have done the research and found that on your system with this particular compiler you can do some freeky optimization by making these assumptions. But be warned any changes in the assumptions will break your code.

Also when comparing two types the compiler will do some auto-conversions before comparison, remember the two types are converted into the same type before comparison. For comparison between integers and bool the bool is converted into an integer and then compared against the other integer (the conversion converts false to 0 and true to 1). If the objects being converted are both bool then no conversion is required and the comparison is done using boolean logic.

Martin York