views:

699

answers:

4

In both Microsoft VC2005 and g++ compilers, the following results in an error:

On win32 VC2005: *sizeof(wchar_t) is 2*

wchar_t *foo = 0;
static_cast<unsigned short *>(foo);

Results in

error C2440: 'static_cast' : cannot convert from 'wchar_t *' to 'unsigned short *' ...

On Mac OS X or Linux g++: *sizeof(wchar_t) is 4*

wchar_t *foo = 0;
static_cast<unsigned int *>(foo);

Results in

error: invalid static_cast from type 'wchar_t*' to type 'unsigned int*'

Of course, I can always use *reinterpret_cast*. However, I would like to understand why it is deemed illegal by the compiler to static_cast to the appropriate integer type. I'm sure there is a good reason...

+2  A: 

By spec using of static_cast restricted by narrowable types, eg: std::ostream& to std::ofstream&. In fact wchar_t is just extension but widely used. Your case (if you really need it) should be fixed by reinterpret_cast

By the way MSVC++ has an option - either treat wchar_t as macro (short) or as stand-alone datatype.

Dewfy
"In fact wchar_t is just extension but widely used." what do you mean by this?
Johannes Schaub - litb
wchar_t is not an extension, it's in the C++ standard.
Steve Jessop
+5  A: 

You cannot cast between unrelated pointer types. The size of the type pointed to is irrelevant. Consider the case where the types have different alignment requirements, allowing a cast like this could generate illegal code on some processesors. It is also possible for pointers to different types to have differrent sizes. This could result in the pointer you obtain being invalid and or pointing at an entirely different location. Reinterpret_cast is one of the escape hatches you hacve if you know for your program compiler arch and os you can get away with it.

Logan Capaldo
I think I see what you mean... Besides the theoretical possibility, is there an actual example where pointers to different types have different size?
VoidPointer
Member function pointers are often a good example. With MSVC for instance, the pointers will be different sizes depending on whether or not the class has virtual functions.
Logan Capaldo
@VoidPointer - I don't think anyone came up with any good examples last time someone asked. `void *` has to be able to represent all other pointer-to-object types (not to-member), which constrains things, and I think POSIX says all object pointers are the same size. So it'd be a bit weird to have different sizes, but you could just about imagine it on an architecture without unified memory, where different banks are used for different types or something. No idea how you'd implement malloc. Or if pointers contained RTTI inline, then maybe int* could be smaller than MyClass*.
Steve Jessop
I'm ignoring pointer-to-member being different sizes, since you're not proposing to reinterpret_cast between data pointers and member pointers. That's never gonna work...
Steve Jessop
Well there are systems where you can have both e.g. 32-bit and 64-bit pointers, but that tends to be associated with the pointer more so than the object. I know casting between data pointers and member function pointers won't actually work (not with even a reinterpret_cast), but it was an example that exists on a system that lots of people are familiar with.
Logan Capaldo
Sorry, yes, I'm not complaining about your example. You were asked for pointers, you provided pointers. I'm just saying that there aren't many obvious reasons to have differently-sized pointers to objects according to the type of the referand. near vs far pointers are an extension to standard C++, and as you say they're a law unto themselves :-)
Steve Jessop
+4  A: 

As with char, the signedness of wchar_t is not defined by the standard. Put this together with the possibility of non-2's complement integers, and for for a wchar_t value c,

*reinterpret_cast<unsigned short *>(&c)

may not equal:

static_cast<unsigned short>(c)

In the second case, on implementations where wchar_t is a sign+magnitude or 1's complement type, any negative value of c is converted to unsigned using modulo 2^N, which changes the bits. In the former case the bit pattern is picked up and used as-is (if it works at all).

Now, if the results are different, then there's no realistic way for the implementation to provide a static_cast between the pointer types. What could it do, set a flag on the unsigned short* pointer, saying "by the way, when you load from this, you have to also do a sign conversion", and then check this flag on all unsigned short loads?

That's why it's not, in general, safe to cast between pointers to distinct integer types, and I believe this unsafety is why there is no conversion via static_cast between them.

If the type you're casting to happens to be the so-called "underlying type" of wchar_t, then the resulting code would almost certainly be OK for the implementation, but would not be portable. So the standard doesn't offer a special case allowing you a static_cast just for that type, presumably because it would conceal errors in portable code. If you know reinterpret_cast is safe, then you can just use it. Admittedly, it would be nice to have a straightforward way of asserting at compile time that it is safe, but as far as the standard is concerned you should design around it, since the implementation is not required even to dereference a reinterpret_casted pointer without crashing.

Steve Jessop
So is it correct to conclude that the only *pure* way to get a wchar_t string into a 3rd party library function that uses (unsigned short *) as a UCS-2 string representation, to copy the character values into a new array of the correct type?
VoidPointer
Yes, that's right. To be completely portable you have to convert from wchar_t to unsigned short*, just in case the two aren't compatible. But you can take some shortcuts: define a conversion function, provide an implementation for the most common case, and leave awkward platforms as future work YAGN. It's a bit fiddly because some platforms need a copy, others don't. Btw, with 4-byte wchar_t you'd be in even worse trouble because not every wchar_t value can be represented in UCS-2. You'd need UTF-16 to represent unicode planes 1-16, so the conversion is not trivial.
Steve Jessop
Thanks. You are right, I was specifically referring to the windows case where wchar_t is 2 bytes. If wchar_t corresponds to UCS-4 one needs to break up values outside the BMP into surrogates.
VoidPointer
Yep. It is fundamentally not possible to write platform-portable code which converts a string of wchar_t to UCS-2. wchar_t might not even be Unicode. So at some stage, you either need platform-specific conversion code, or else you just restrict yourself to only support platforms where reinterpret_cast works.
Steve Jessop
+1  A: 

Pointers are not magic "no limitations, anything goes" tools.

They are, by the language specification actually very constrained. They do not allow you to bypass the type system or the rest of the C++ language, which is what you're trying to do.

You are trying to tell the compiler to "pretend that the wchar_t you stored at this address earlier is actually an int. Now read it."

That does not make sense. The object stored at that address is a wchar_t, and nothing else. You are working in a statically typed language, which means that every object has one, and juts one, type.

If you're willing to wander into implementation-defined behavior-land, you can use a reinterpret_cast to tell the compiler to just pretend it's ok, and interpret the result as it sees fit. But then the result is not specified by the standard, but by the implementation.

Without that cast, the operation is meaningless. A wchar_t is not an int or a short.

jalf