tags:

views:

261

answers:

4

I came across this in an IRC channel yesterday and didn't understand why it was bad behavior.

#include <stdio.h>

int main(void)
{
     char x[sizeof(int)] = { '\0' }; int *y = (int *) x;
     printf("%d\n", *y);
}

Is there any loss of data or anything? can anyone give me any docs to explain further about what it does wrong? thanks!

+7  A: 

For one thing, the array x is not guaranteed to be aligned properly for an int.

There's been a conversation topic about how this might affect techniques like placement new. It should be noted that placement new needs to occur on properly aligned memory as well, but placement new is often used with memory that allocated dynamically, and allocation functions (in C and C++) are required to return memory that's suitably aligned for any type specifically so the address can be assigned to a pointer of any type.

The same isn't true for the memory allocated by the compiler for automatic variables.

Michael Burr
Is there a difference in rules between C and C++? This is valid in C++. (Just like `placement new`, `int *y = new (x) int;`)
GMan
I don't think this is valid C++. The types involved do not satisfy 3.10/15. In particular, the behavior of the code depends on the alignment of `int` and on the value representation of `int`. In C++, this is even more of an issue, because C++ states that unspecified behavior introduces non-determinism into its abstract machine, and if *any* of such possible execution paths yield undefined behavior, then it has no requirements on the executing implementation. That means, the code above is effectively executing undefined behavior, because the value and object representation is unspecified.
Johannes Schaub - litb
The C++-FAQ itself even uses almost the same kind of code: http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.10 I know the C++-FAQ isn't the final word on things, but I don't think the author would miss something like that. EDIT: But now that litb has chimed in... :P
GMan
Hm, looking around a bit it does seem the case you guys are right. I retract my bad statement, and learn my new thing for the day.
GMan
@GMan this is different than placement new. Assuming the array is properly aligned, then if you use placement new, you reuse the storage of the array to create a new object. If you then read from that object, the object type and the accessing expression have compatible types. Luckily, the alignment of an object is implementation-defined, and so there is no problem with regard to non-determinism. Assuming proper alignment, you first have to "reuse" the storage of `x` for a new object of type `int`. Granted, i've never been able to find out what "reuse" means exactly (see 3.8/1).
Johannes Schaub - litb
GMan
@Johannes: Ok, I think I see, not the complexity I was thinking, just simple bad behavior. To make sure, the alignment issue still stands in both a cast and placement new though, right?
GMan
@GMan: Pointers returned by allocation functions must be aligned so it may be assigned to a pointer of any type. That is not required to be true for automatic variables.
Michael Burr
@GMan: the problem is indeed the alignment and placement new does not have anything to do with alignment. However, `int* y = (int*)operator new(sizeof(int));` *is* guaranteed to be suitably aligned---it's the stack char array that isn't.
Roger Pate
@Gman: regarding the example in the C++ FAQ, it's probably not a great example (because of the alignment issues), but they do cover themselves a bit with the following: "If your Fred class needs to be aligned on a 4 byte boundary but you supplied a location that isn't properly aligned, you can have a serious disaster on your hands".
Michael Burr
@Roger @Johannes @Michael: Right, I think I got it then, thanks guys. Now, this is one of those things where it could very well be a-ok and often is? Just not good practice, in the same way people assume `sizeof(int)==4`, getting lucky (Now I'll need to fix this in a few places. Curses whoever told me it was always ok)
GMan
@GMan: Right, platform-specific and you can still get lucky even when it matters (e.g. 25% of the time if you need 4-byte alignment). http://stackoverflow.com/questions/2088550/#2088571
Roger Pate
Actually, I'd guess that even on machines where alignment is an issues that most of the time compilers won't aggressively pack local variables, so you're likely to get something that 'accidentally 'works' more often than not. But that doesn't mean it's correct or that it's code that should be written.
Michael Burr
+10  A: 

The array x may not be properly aligned in memory for an int. On x86 you won't notice, but on other architectures, such as SPARC, dereferencing y will trigger a bus error (SIGBUS) and crash your program.

This problem may occur for any address:

int main(void)
{
    short a = 1;
    char b = 2;

    /* y not aligned */
    int* y = (int *)(&b);
    printf("%d\n", *y); /* SIGBUS */
}
Matt McClellan
You are casting char to pointer, so its obvious that it fails. While his array have same size as int, and its casting to int.
qba
@qba - the issue is alignment, not size. There's a good discussion about this following Michael Burr's answer.
R Samuel Klatchko
It would probably be clearer if you made `b` an `int`.
GMan
But if `b` were an `int`, it wouldn't illustrate the issue. Better would be `char b[8] = { 0, 1, 2, 3, 4, 5, 6, 7};` and `int* y = (int*)(` which has a higher probability of triggering a bus error.
Mike DeSimone
Well, it would still demonstrate the possibility of it happening. You're out to definitely get a bus error, I only thought we were showing a situation where it might fail, even though it looks innocent.
GMan
@GMan: If `b` were an `int` there won't be a bus error because the compiler will align `b` on the correct boundary.
Matt McClellan
Oh yeah, derp, missed the entire point of the exercise. Sigh.
GMan
A: 

Why not use a union instead?

union xy {
    int y;
    char x[sizeof(int)];
};
union xy xyvar = { .x = { 0 } };
...
printf("%d\n", xyvar.y);

I haven't verified it, but I would think the alignment problems mentioned by others would not be a problem here. If anyone has an argument for why this isn't portable, I'd like to hear it.

Tim Schaeffer
It isn't portable, because the standard says that writing one member of a union, and then reading the other isn't portable. OTOH, if you're going to do type punning at all, this is *about* as portable as it gets.
Jerry Coffin
So, it is not *officially* portable, but is *practicly* portable. I was looking for some setup where it wouldn't work (ignoring the obvious endianness problems).
Tim Schaeffer
@Jerry, if `x` was of type `unsigned char`, then it would be defined and portable, wouldn't it?
Alok
@Alok: that depends -- according to C89/90, it's not, but according to C++ and C99, it is portable to a degree. The resulting values are unspecified, but at least it doesn't involve any undefined behavior.
Jerry Coffin
@Tim: there are times/places that it won't work, but they're pretty unusual. Just for example, at least one compiler has been written that uses type-tagging so any attempt at type punning like this will fail. As far as I know, that was never publicly released though, so I don't know whether you count it as practical or not.
Jerry Coffin
@Jerry, thanks for the answer. I was reading C99 draft and thought it should be portable (of course, the values are unspecified). I didn't know it wasn't portable in C89.
Alok
A: 

I think that while the alignment issue is true, it is not the whole story. Even if alignment is not a problem, you are still taking 4 bytes on the stack, only one of them initialized to zero, and treating them like an integer. This means that the printed value has 24 un-initialized bits. And using un-initialized values is a basic 'wrong'.

(Assuming sizeof(int)==4 for simplicity).

Oren Shemesh
No, all the elements of the array `x` are initialized to 0. In C, if an aggregate type has fewer initializers than needed, they're rest are all taken to be 0.
Alok