tags:

views:

5103

answers:

7

Is there any good example to give the difference between a 'struct' and a 'union'? Basically I know that struct uses all the memory of its member and union uses the largest members memory space. Is there any other OS level difference?

+1  A: 

A structure allocates the total size of all elements in it.

A union only allocates as much memory as its largest member requires.

CMS
You might want to also add that union members "overlay" each other in that they all start at the beginning address of the allocated union "structure".
Jim Buck
+46  A: 

With a union, you're only supposed to use one of the elements, because they're all stored at the same spot. This makes it useful when you want to store something that could be one of several types. A struct, on the other hand, has a separate memory location for each of its elements and they all can be used at once.

To give a concrete example of their use, I was working on a Scheme interpreter a little while ago and I was essentially overlaying the Scheme data types onto the C data types. This involved storing in a struct an enum indicating the type of value and a union to store that value.

union foo {
  int a;   // can't use both a and b at once
  char b;
} foo;

struct bar {
  int a;   // can use both a and b simultaneously
  char b;
} bar;

union foo x;
x.a = 3; // OK
x.b = 'c'; // NO! this affects the value of x.a!

struct bar y;
y.a = 3; // OK
y.b = 'c'; // OK

edit: If you're wondering what setting x.b to 'c' changes the value of x.a to, technically speaking it's undefined. On most modern machines a char is 1 byte and an int is 4 bytes, so giving x.b the value 'c' also gives the first byte of x.a that same value:

union foo x;
x.a = 3;
x.b = 'c';
printf("%i, %i\n", x.a, x.b);

prints

99, 99

Why are the two values the same? Because the last 3 bytes of the int 3 are all zero, so it's also read as 99. If we put in a larger number for x.a, you'll see that this is not always the case:

union foo x;
x.a = 387439;
x.b = 'c';
printf("%i, %i\n", x.a, x.b);

prints

387427, 99

To get a closer look at the actual memory values, let's set and print out the values in hex:

union foo x;
x.a = 0xDEADBEEF;
x.b = 0x22;
printf("%x, %x\n", x.a, x.b);

prints

deadbe22, 22

You can clearly see where the 0x22 overwrote the 0xEF.

BUT

In C, the order of bytes in an int are not defined. This program overwrote the 0xEF with 0x22 on my Mac, but there are other platforms where it would overwrite the 0xDE instead because the order of the bytes that make up the int were reversed. Therefore, when writing a program, you should never rely on the behavior of overwriting specific data in a union because it's not portable.

For more reading on the ordering of bytes, check out endianness.

Kyle Cronin
thanks, this gives a very clear and good answer. good example to go by...
gagneet
using this example, in union, if x.b='c' what gets stored in x.a? is it the reference # of the char?
kylex
hopefully that explains in more detail what's stored in x.a when you set x.b.
Kyle Cronin
This was an excellent answer!
Matt Baker
+1  A: 

"union" and "struct" are constructs of the C language. Talking of an "OS level" difference between them is inappropriate, since it's the compiler that produces different code if you use one or another keyword.

friol
+5  A: 

As you already state in your question, the main difference between union and struct is that union members overlay the memory of each other so that the sizeof of a union is the one , while struct members are laid out one after each other (with optional padding in between). Also an union is large enough to contain all its members, and have an alignment that fits all its members. So let's say int can only be stored at 2 byte addresses and is 2 bytes wide, and long can only be stored at 4 byte addresses and is 4 bytes long. The following union

union test {
    int a;
    long b;
};

could have a sizeof of 4, and an alignment requirement of 4. Both an union and a struct can have padding at the end, but not at their beginning. Writing to a struct changes only the value of the member written to. Writing to a member of an union will render the value of all other members invalid. You cannot access them if you haven't written to them before, otherwise the behavior is undefined. GCC provides as an extension that you can actually read from members of an union, even though you haven't written to them most recently. For an Operation System, it doesn't have to matter whether a user program writes to an union or to a structure. This actually is only an issue of the compiler.

Another important property of union and struct is, they allow that a pointer to them can point to types of any of its members. So the following is valid:

struct test {
    int a;
    double b;
} * some_test_pointer;

some_test_pointer can point to int* or bool*. If you cast an address of type test to int*, it will point to its first member, a, actually. The same is true for an union too. Thus, because an union will always have the right alignment, you can use an union to make pointing to some type valid:

union a {
    int a;
    double b;
};

That union will actually be able to point to an int, and a double:

union a * v = (union a*)some_int_pointer;
*some_int_pointer = 5;
v->a = 10;
return *some_int_pointer;

is actually valid, as stated by the C99 standard:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object
  • ...
  • an aggregate or union type that includes one of the aforementioned types among its members

The compiler won't optimize out the v->a = 10; as it could affect the value of *some_int_pointer (and the function will return 10 instead of 5).

Johannes Schaub - litb
thanks, explains a lot of things i was unclear about!
gagneet
Hard to believe this answer wasn't upvoted.
BobbyShaftoe
+1  A: 

You have it, that's all. But so, basically, what's the point of unions?

You can put in the same location content of different types. You have to know the type of what you have stored in the union (so often you put it in a struct with a type tag...).

Why is this important? Not really for space gains. Yes, you can gain some bits or do some padding, but that's not the main point anymore.

It's for type safety, it enables you to do some kind of 'dynamic typing': the compiler knows that your content may have different meanings and the precise meaning of how your interpret it is up to you at run-time. If you have a pointer that can point to different types, you MUST use a union, otherwise you code may be incorrect due to aliasing problems (the compiler says to itself "oh, only this pointer can point to this type, so I can optimize out those accesses...", and bad things can happen).

Piotr Lesnicki
+5  A: 

Here's the short answer: a struct is a record structure: each element in the struct allocates new space. So, a struct like

struct foobarbazquux_t {
    int foo;
    long bar;
    double baz; 
    long double quux;
}

allocates at least (sizeof(int)+sizeof(long)+sizeof(double)+sizeof(long double)) bytes in memory for each instance. ("At least" because architecture alignment constraints may force the compiler to pad the struct.)

On the other hand,

union foobarbazquux_u {
    int foo;
    long bar;
    double baz; 
    long double quux;
}

allocates one chunk of memory and gives it four aliases. So sizeof(union foobarbazquux_u) ≥ max((sizeof(int),sizeof(long),sizeof(double),sizeof(long double)), again with the possibility of some addition for alignments.

Charlie Martin
thanks, that gives quite a good insight into the workings of these two.
gagneet
+2  A: 

Is there any good example to give the difference between a 'struct' and a 'union'?

An imaginary communications protocol

struct packetheader {
   int sourceaddress;
   int destaddress;
   int messagetype;
   union request {
       char fourcc[4];
       int requestnumber;
   };
};

In this imaginary protocol, it has been sepecified that, based on the "message type", the following location in the header will either be a request number, or a four character code, but not both. In short, unions allow for the same storage location to represent more than one data type, where it is guaranteed that you will only want to store one of the types of data at any one time.

Unions are largely a low-level detail based in C's heritage as a system programming language, where "overlapping" storage locations are sometimes used in this way. You can sometimes use unions to save memory where you have a data structure where only one of several types will be saved at one time.

In general, the OS doesn't care or know about structs and unions -- they are both simply blocks of memory to it. A struct is a block of memory that stores several data objects, where those objects don't overlap. A union is a block of memory that stores several data objects, but has only storage for the largest of these, and thus can only store one of the data objects at any one time.