views:

143

answers:

4
enum Enums { k1, k2, k3, k4 };

union MYUnion { 
    struct U{ 
         char P;
    }u;

    struct U0 { 
        char state; 
    } u0; 

    struct U1 { 
        Enums e; 
        char c; 
        int v1; 
    } u1; 

    struct U2 { 
        Enums e; 
        char c; 
        int v1; 
        int v2; 
    } u2; 

    struct U3 { 
        Enums e; 
        unsigned int i; 
        char c; 
    } u3; 

    struct U4 { 
        Enums e;
        unsigned int i; 
        char c; 
        int v1; 
    } u4; 

    struct U5 { 
        Enums e; 
        unsigned int i; 
        char c; 
        int v1; 
        int v2; 
    } u5; 
} myUnion

I'm so confused with this whole idea of Union in C++. What does this "myUnion" look like in memory?? I know that the data share the same memory block, but how? What is the size of "myUnion"? If it is the size of "u5" then how is the data allocated in this block of memory??

A: 
  1. the size of the union is the size of the largest thing in the union.
  2. the layout of the union is whatever you last stored.

So, in your last union, if you store into .i, and then store into .e the first byte of the int will be overwritten with the enum value (assuming that sizeof (enum) is 1 on your environment).

A union is like:

void * p = malloc(sizeof(biggest_item))
Enums *ep = (Enums *)e;
unsigned int * ip = (unsigned int *)p;
char *cp = (char *)p;

Assignments to *ep, *ip, and *cp work just like the union.

bmargulies
+1  A: 

As Murali said, the size of the union will be that of the largest struct that participates in the union.

The app will allocate enough bytes for the largest block. Memory mapping works like this:

Consider the following:

union Foo
{
  struct A
  {
    int x;
    unsigned char y;
    unsigned char z;

  }
  struct B
  {
    unsigned char a;
    unsigned char b;
    unsigned char c;
    unsigned char d;
    unsigned char e;
  }
}

In this case, assuming that int is 32 bits (which depends on your target platform), a,b,c and d provide access to the bytes that make up the integer X. Writing to A will overwrite the first byte of x, b will overwrite the second byte of x, and so forth.

Conversely, writing a value to X will affect a,b,c and d.

unsigned chars y and e occupy the same space (again, depending on the fact that int is 32 bits) so .y and .e are effectively aliases for each other.

The unsigned char A.z does not overlap any element of struct B, so it is effectively immune to changes to B.

The point here is that the elements of the unioned structs occupy the same memory. The different structs provide different ways to read and write the same memory by letting you use different datatypes.

David Lively
Most of this isn't really guaranteed. The compiler is free to insert padding between structure members so you can't rely on `A::x` overlapping with `B::a`, `B::b`, `B::c`, and `B::d`. You are guaranteed that `` then all of the assumptions should be safe by the Standard.
D.Shawley
"The app will allocate enough bytes for the largest block". In this case. Sometimes it will have to allocate more than that in order to satisfy an alignment requirement. For example if the largest member of the union contains 3 ints (and int is 4-aligned on the platform), and another member contains one long (and long is 8-aligned on the platform), then the union will have to be 16 bytes. It can't be 12 bytes, because then when you made an array of them, the long would be misaligned in the objects with odd offsets.
Steve Jessop
"a,b,c and d provide access to the bytes that make up the integer X". This is compiler-dependent, not guaranteed by the standard. The standard says that if you write to X, then read from a, you get undefined behaviour. AFAIK compilers do always guarantee, though, that accessing multiple members of a union is valid, and gives you access to corresponding bytes according to however that compiler lays out structs. Too many programs rely on union-casts for compiler-writers not to support them.
Steve Jessop
A: 

And it depends on the memory model (big-little)endian, etc. of your target CPU to know what the block of memory will look like, but in general each struct will start a the same address and layout over the same area -- useful if you know what it is going to do, a shot to the foot otherwise.

Don
Endianness only comes into play when interpreting the bytes within a given variable. It will not change the order in which struct fields are placed in memory (ie, the third integer in a struct will still start at 2*sizeof(int) bytes, regardless of your platform or architecture).
David Lively
True, just that if you union an int and a char[4], your code has to know what endianness your CPU has to understand what it is going to get.
Don
@David - the third integer member in a struct will start at at least 2*sizeof(int) bytes. It is free to start at 2*(sizeof(int) * 20) bytes if the compiler wants it to.
D.Shawley
A: 

You are right to be confused! I'm confused... Let's look at something simple first before moving on to the more complex example you have.

First union basics. A union just means that when you create a variable of the union type, the underlying components (in the example below i and f) are really overlapping in memory. It lets you sometimes treat that memory as an int and sometimes treat that memory as a float. This naturally can be nasty and you really have to know what you're doing.

union AUnion
{
   int i;
   float f;
}; // assumes an int is 32 bits

AUnion aUnion;
aUnion.i = 0;
printf("%f", aUnion.f);

In the above code, what will be print out? Well to understand the answer to that question you have to understand how ints and floats are represented in memory. Both take up 32 bits of memory. How that memory is interpreted however differs between the two types. When I set aUnion.i = 0, I am saying "write a 0'd integer to aUnion". A 0'd integer, it so happens, corresponds to setting all 32-bits to 0. Now when we go to print aUnion.f, we are saying "treat aUnion as if the bits are really a 32-bit float, and print it out! The computer then treats all those underlying bits as if they are really parts of a float instead of the int. The computer knows how to treat any random bunch of 32-bits as a float because it knows how a floating point number is formatted in binary.

Now to take on some of your more complex union code:

enum Enums { k1, k2, k3, k4 };

union MYUnion { 
struct U{ 
     char P;
}u;

struct U0 { 
    char state; 
} u0; 

struct U1 { 
    Enums e; 
    char c; 
    int v1; 
} u1;

All these structs are overlapped in the same way the int and float were above. Now if we assume that Enums are mapped to an int. Then we can map the enums to int values in the underlying memory, based on the rules of enums:

 enum Enums { k1/*0*/, k2/*1*/, k3/*2*/, k4/*3*/ };

So then what we have is

union MYUnion { 
struct U{ 
     char P;
}u;

struct U0 { 
    char state; 
} u0; 

struct U1 { 
    int e; 
    char c; 
    int v1; 
} u1;

And you have a very strange union because if you do

MyUnion m;
m.u.P = 'h'

When later you access the enum (which is most likely an int beneath the hoods), it will be read as an invalid value. This is because P is just 1 byte, and the int is 4 bytes. When read as the enum, you will get weird results.

I highly suggest you go sack who is responsible for this code.

Doug T.