tags:

views:

830

answers:

7
+2  Q: 

Unions in C

When should unions be used? Why do we need them?

+3  A: 

Unions allow data members which are mutually exclusive to share the same memory. This is quite important when memory is more scarce, such as in embedded systems.

In the following example:

union {
   int a;
   int b;
   int c;
} myUnion;

This union will take up the space of a single int, rather than 3 separate int values. If the user set the value of a, and then set the value of b, it would overwrite the value of a since they are both sharing the same memory location.

LeopardSkinPillBoxHat
A: 

Unions are used when you want to model structs defined by hardware, devices or network protocols, or when you're creating a large number of objects and want to save space. You really don't need them 95% of the time though, stick with easy-to-debug code.

Paul Betts
+5  A: 

Unions are often used to convert between the binary representations of integers and floats:


union
{
  int x;
  float y;
} u;

// Convert floating-point bits to integer:
u.f = 3.14159f;
printf("As integer: %08x\n", u.x);

Although this is technically undefined behavior according to the C standard (you're only supposed to read the field which was most recently written), it will act in a well-defined manner in virtually any compiler.

Unions are also sometimes used to implement pseudo-polymorphism in C, by giving a structure some tag indicating what type of object it contains, and then unioning the possible types together:

enum Type { INTS, FLOATS, DOUBLE };
struct S
{
  Type s_type;
  union
  {
    int s_ints[2];
    float s_floats[2];
    double s_double;
  };
};

void do_something(struct S *s)
{
  switch(s->s_type)
  {
    case INTS:  // do something with s->s_ints
      break;

    case FLOATS:  // do something with s->s_floats
      break;

    case DOUBLE:  // do something with s->s_double
      break;
  }
}

This allows the size of struct S to be only 12 bytes, instead of 28.

Adam Rosenfield
+1 for being the only answer which states, rightly, that reading value from a union member to which value was not written to, is undefined behaviour by standards.
legends2k
+2  A: 

It's difficult to think of a specific occasion when you'd need this type of flexible structure, perhaps in a message protocol where you would be sending different sizes of messages, but even then there are probably better and more programmer friendly alternatives.

Unions are a bit like variant types in other languages - they can only hold one thing at a time, but that thing could be an int, a float etc. depending on how you declare it.

For example:

typedef union MyUnion MYUNION;
union MyUnion
{
   int MyInt;
   float MyFloat;
};

MyUnion will only contain an int OR a float, depending on which you most recently set. So doing this:

MYUNION u;
u.MyInt = 10;

u now holds an int equal to 10;

u.MyFloat = 1.0;

u now holds a float equal to 1.0. It no longer holds an int. Obviously now if you try and do printf("MyInt=%d", u.MyInt); then you're probably going to get an error, though I'm unsure of the specific behaviour.

The size of the union is dictated by the size of its largest field, in this case the float.

Xiaofu
A: 

Unions are great. One clever use of unions I've seen is to use them when defining an event. For example, you might decide that an event is 32 bits.

Now, within that 32 bits, you might like to designate the first 8 bits as for an identifier of the sender of the event... Sometimes you deal with the event as a whole, sometimes you dissect it and compare it's components. unions give you the flexibility to do both.

union Event
{
  unsigned long eventCode;
  unsigned char eventParts[4];
};
dicroce
A: 

Here's an example of a union from my own codebase (from memory and paraphrased so it may not be exact). It was used to store language elements in an interpreter I built. For example, the following code:

set a to b times 7.

consists of the following language elements:

  • symbol[set]
  • variable[a]
  • symbol[to]
  • variable[b]
  • symbol[times]
  • constant[7]
  • symbol[.]

Language elements were defines as '#define' values thus:

#define ELEM_SYM_SET        0
#define ELEM_SYM_TO         1
#define ELEM_SYM_TIMES      2
#define ELEM_SYM_FULLSTOP   3
#define ELEM_VARIABLE     100
#define ELEM_CONSTANT     101

and the following structure was used to store each element:

typedef struct {
    int elem_type;
    union {
        char *elem_str;
        int elem_val;
    }
} tElem;

then the size of each element was the size of the maximum union (4 bytes for the type and 4 bytes for the union).

In order to create a "set" element, you would use:

tElem e;
e.elem_type = ELEM_SYM_SET;

In order to create a "variable[b]" element, you would use:

tElem e;
e.elem_type = ELEM_VARIABLE;
e.elem_str = strdup ("b");   // make sure you free this later

In order to create a "constant[7]" element, you would use:

tElem e;
e.elem_type = ELEM_CONSTANT;
e.elem_val = 7;   // make sure you free this later

The basic premise is that the elem_str and elem_val are not contiguous in memory, they actually overlap, so it's a way of getting a different view on the same block of memory, illustrated here, where the structure is based at memory location 0x1010 and integers and pointers are both 4 bytes:

       +------------------------+
0x1010 |                        |
0x1011 |       elem_type        |
0x1012 |                        |
0x1013 |                        |
       +-------------+----------+
0x1014 |             |          |
0x1015 | elem_string | elem_val |
0x1016 |             |          |
0x1017 |             |          |
       +-------------+----------+

If it were just in a structure, it would look like this:

       +-------------+
0x1010 |             |
0x1011 |  elem_type  |
0x1012 |             |
0x1013 |             |
       +-------------+
0x1014 |             |
0x1015 | elem_string |
0x1016 |             |
0x1017 |             |
       +-------------+
0x1018 |             |
0x1019 |  elem_val   |
0x101A |             |
0x101B |             |
       +-------------+
paxdiablo
+3  A: 

Unions are particularly useful in Embedded programming or in situations where direct access to the hardware/memory is needed. Here is a trivial example:

typedef union
{
    struct {
        unsigned char byte1;
        unsigned char byte2;
        unsigned char byte3;
        unsigned char byte4;
    } bytes;
    unsigned int dword;
} HW_Register;
HW_Register reg;

Then you can access the reg as follows:

reg.dword = 0x12345678;
reg.bytes.byte3 = 4;

Endianism and processor architecture are of course important.

Another useful feature is the bit modifier:

typedef union
{
    struct {
        unsigned char b1:1;
        unsigned char b2:1;
        unsigned char b3:1;
        unsigned char b4:1;
        unsigned char reserved:4;
    } bits;
    unsigned char byte;
} HW_RegisterB;
HW_RegisterB reg;

With this code you can access directly a single bit in the register/memory address:

x = reg.bits.b2;
kgiannakakis