views:

485

answers:

3

A common question that comes up from time to time in the world of C++ programming is compile-time determination of endianness. Usually this is done with barely portable #ifdefs. But does the C++0x constexpr keyword along with template specialization offer us a better solution to this?

Would it be legal C++0x to do something like:

constexpr bool little_endian()
{
   const static unsigned num = 0xAABBCCDD;
   return reinterpret_cast<const unsigned char*> (&num)[0] == 0xDD;
}

And then specialize a template for both endian types:

template <bool LittleEndian>
struct Foo 
{
  // .... specialization for little endian
};

template <>
struct Foo<false>
{
  // .... specialization for big endian
};

And then do:

Foo<little_endian()>::do_something();
+10  A: 

Assuming N2116 is the wording that gets incorporated, then your example is ill-formed (notice that there is no concept of "legal/illegal" in C++). The proposed text for [decl.constexpr]/3 says

  • its function-body shall be a compound-statement of the form { return expression; } where expression is a potential constant expression (5.19);

Your function violates the requirement in that it also declares a local variable.

Edit: This restriction could be overcome by moving num outside of the function. The function still wouldn't be well-formed, then, because expression needs to be a potential constant expression, which is defined as

An expression is a potential constant expression if it is a constant expression when all occurrences of function parameters are replaced by arbitrary constant expressions of the appropriate type.

IOW, reinterpret_cast<const unsigned char*> (&num)[0] == 0xDD would have to be a constant expression. However, it is not: &num would be a address constant-expression (5.19/4). Accessing the value of such a pointer is, however, not allowed for a constant expression:

The subscripting operator [] and the class member access . and operators, the & and * unary operators, and pointer casts (except dynamic_casts, 5.2.7) can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators.

Edit: The above text is from C++98. Apparently, C++0x is more permissive what it allows for constant expressions. The expression involves an lvalue-to-rvalue conversion of the array reference, which is banned from constant expressions unless

it is applied to an lvalue of effective integral type that refers to a non-volatile const variable or static data member initialized with constant expressions

It's not clear to me whether (&num)[0] "refers to" a const variable, or whether only a literal num "refers to" such a variable. If (&num)[0] refers to that variable, it is then unclear whether reinterpret_cast<const unsigned char*> (&num)[0] still "refers to" num.

Martin v. Löwis
I don't feel it applies, here. The static variable is constant itself.
GMan
The wording in 4.1 of N2116 states that the body of the function must only have one statement (that being the return statement). Mind you, from my quick glance over the text, I don't see anything prohibiting the above code if num is defined globally.
GRB
@GMan: as GRB says, the draft is fairly clear that ou must have only one statement, and a declaration *is* a statement (C++98, 6.7, Declaration statement). @GRB: I'll edit my response to discuss moving the constant outside of the function.
Martin v. Löwis
+1, thanks for clearing that up Martin. While I did suggest moving the variable as a possibility, the idea that `()
GRB
Johannes Schaub - litb
Johannes Schaub - litb
Johannes Schaub - litb
yeah sorry about the confusion litb, I meant for `(D)
GRB
@litb: the text banning derefencing pointers is indeed from c++98. I see in C++0x this whole text has been rephrased. As for `reinterpret_cast`: I cannot see where it is banned from a constant expression in C++0x, so I then think it should be well-formed. However, since it dereferences the wrong pointer type, it has undefined behavior (which, in turn, is the whole point of the endianness test).
Martin v. Löwis
The matter with `reinterpret_cast` is not a constant expression because of the point where it says "a type conversion from a pointer or pointer-to-member type to a literal type" in `5.19/2`. The dereference is not undefined behavior, because you are allowed to read the underlying bytes of any trivially copyable type by using `char*` or `unsigned char*` (see `3.9/2`).
Johannes Schaub - litb
I agree with you it could be clearer about the term "variable" though. The term is defined in `3/6` as "A variable is introduced by the declaration of an object. The variable's name denotes the object.". So, i think it wants to say that a "variable" is a translation time entity that has a name which denotes an object. So the following yields an integral constant expression: `int const c = 0; constexpr int f() { return n; }`, but the following not, because the lvalue refers to an object, but not to a variable (the wording is surely confusing here and slighly backwards imho): `return (`
Johannes Schaub - litb
Notice that i think `(` and `p` is going to be constant-initialized. But the lvalue-to-rvalue conversion on `(` (according to the rule in `3.6.2/2`), and so its initialization time compared to another such object in another translation unit is unspecified.
Johannes Schaub - litb
So it seems the consensus here is that the little_endian() function is definitely malformed, because it consists of more than one statement. A solution would be to move the declaration outside of the function, but even then, it is questionable at this time whether a reinterpret_cast is allowed in a constant expression.
Charles Salvia
No, that is not questionable. It's certain that it's not allowed. The wording is clear.
Johannes Schaub - litb
@litb: I disagree that the `reinterpret_cast` is disallowed. 3.9/2 defines a literal type as either a scalar type, a class type with only literal members, or a literal array; 15.9/2 only bans conversions into such type. The proposed function converts one pointer type to another pointer type; such conversion is not banned.
Martin v. Löwis
Johannes Schaub - litb
+2  A: 

That is a very interesting question.

I am not Language Lawyer, but you might be able to replace the reinterpret_cast with a union.

const union {
    int int_value;
    char char_value[4];
} Endian = { 0xAABBCCDD };

constexpr bool little_endian()
{
   return Endian[0] == 0xDD;
}
iain
Placing a value in a union then accessing the union via another member is not valid.
GMan
@GMan: It is well-formed, but invokes undefined behavior. "valid" is not a property defined in the C++ standard.
Martin v. Löwis
Yea, threw my own terminology in there. Thanks for pointing out the correct terms.
GMan
@Martin: Exactly what § of the standard says it invokes undefined behaviour? A char lvalue may certainly alias (part of) an int object. Also, all possible bit patterns represent valid char and unsigned char values as far as I can tell. This leads me to believe this is just invokes implementation-defined behaviour and not UB.
sellibitze
A: 

If your goal is to insure that the compiler optimizes little_endian() into a constant true or false at compile-time, without any of its contents winding up in the executable or being executed at runtime, and only generating code from the "correct" one of your two Foo templates, I fear you're in for a disappointment.

I also am not a language lawyer, but it looks to me like constexpr is like inline or register: a keyword that alerts the compiler writer to the presence of a potential optimization. Then it's up to the compiler writer whether or not to take advantage of that. Language specs typically mandate behaviors, not optimizations.

Also, have you actually tried this on a variety of C++0x complaint compilers to see what happens? I would guess most of them would choke on your dual templates, since they won't be able to figure out which one to use if invoked with false.

Bob Murphy
It's not quite the same. The result of a 'constexpr' function generally can be used where a constant expression is required, eg. an array bounds. Although I believe there is some leeway in the case of function templates.
Richard Corden