views:

254

answers:

10

I have difficulty in understanding the use of union in C. I have read lot of posts here on SO about the subject. But none of them explains about why union is preferred when same thing can be achieved using a struct.

Quoting from K&R

As an example such as might be found in a compiler symbol table manager, suppose that a constant may be an int, a float, or a character pointer. The value of a particular constant must be stored in a variable of the proper type, yet it is most convenient for table management if the value occupies the same amount of storage and is stored in the same place regardless of its type. This is the purpose of a union a single variable that can legitimately hold any of one of several types. The syntax is based on structures:

union u_tag {
      int ival;
      float fval;
      char *sval;
} u;

The usage will be

if (utype == INT)
    printf("%d\n", u.ival);
if (utype == FLOAT)
    printf("%f\n", u.fval);
if (utype == STRING)
    printf("%s\n", u.sval);
else
    printf("bad type %d in utype\n", utype);

The same thing can be implemented using a struct. Something like,

struct u_tag {
    utype_t utype;
    int ival;
    float fval;
    char *sval;
} u;

if (u.utype == INT)
    printf("%d\n", u.ival);
if (u.utype == FLOAT)
    printf("%f\n", u.fval);
if (u.utype == STRING)
    printf("%s\n", u.sval);
else
    printf("bad type %d in utype\n", utype);

Isn't this the same? What advantage union gives?

Any thoughts?

+3  A: 

Union uses less memory and lets you do more dangerous things. It represents one continuous block of memory, which can be interpreted as either an integer, floating point value or a character pointer.

Eric
+6  A: 

In the example you posted, the size of union would be the size of float (assuming it is the largest one - as pointed out in the comments, it can vary in a 64 bit compiler), while the size of struct would be the sum of sizes of float, int, char* and the utype_t (and padding, if any).

The results on my compiler:

union u_tag {
    int ival;
    float fval;
    char *sval;
};
struct s_tag {
    int ival;
    float fval;
    char *sval;
};

int main()
{
    printf("%d\n", sizeof(union u_tag));  //prints 4
    printf("%d\n", sizeof(struct s_tag)); //prints 12
    return 0;
}
Amarghosh
Thanks. So the only difference is in size?
Appu
If you're on a 64-bit platform, then the size of the union would be the size of char*.
Eric
@Appu And the ease of use: struct is convenient in most cases. Use union only when you must.
Amarghosh
@Eric ya, that's why I said "the results on my compiler"
Amarghosh
@Appu - no they are fundamentally different, a struct stores all the valuyes separately. A union lets you refer to a single memory location as an int/float/long etc.
Martin Beckett
You use unions where you lack memory.
fahad
+3  A: 

Unions are used to save only one type of data at a time. If a value is reassigned the old value is overwritten and cannot be accessed. In your example int ,float and char members can all have different values at any time when used as a struct. Its not the case in union. So it depends on your program requirements and design. Check this article on when to use union. Google may give even more results.

Praveen S
+2  A: 

The language offers the programmer numerous facilities to apply high level abstractions to the lowest level machine data and operations.

However, the mere presence of something does not automatically suggest its use is a best practice. Their presence makes the language powerful and flexible. But industry needs led to the development of programming techniques that favored clarity and maintainability over the absolute best code efficiency or storage efficiency possible.

So if a problem's solution set contains both unions and structures it is the programmer's responsibility to decide whether the need for compact storage outweighs the costs.

In recent times the cost of memory has been exceedingly low. The introduction of the bool type (and even prior to that, int variables) allowed a programmer of 32-bit systems to use 32 bits to represent a binary state. You see that frequently in programming even though a programmer could use masks and get 32 true/false values into a variable.

So to answer your question, the union offers more compact storage for a single value entity out of several possible types than a traditional structure but at the cost of clarity and possible subtle program defects.

Amardeep
+5  A: 

Unions can be used when no more than one member need be accessed at a time. That way, you can save some memory instead of using a struct.

There's a neat "cheat" which may be possible with unions: writing one field and reading from another, to inspect bit patterns or interpret them differently. This obviously has non-standard, machine-dependent behaviour.

Michael Foukarakis
you can also do the same with a cast in C's weak type system
Gary
+1 for mentioning a use of unions beyond merely saving memory.
Brian
@Gary: Yes, but then you need to do the cast every time you want to make the switch. Using a union may be more readable in some situations (and less readable in others, of course). A common case I've seen for a union is to provide a type that has a high part and a low part (i.e. because the type is larger than the size supported within the native types) in order to make communication between libraries not require new types.
Brian
+1  A: 

Using unions to save memory is mostly not done in modern systems, since the code to access a union member will quickly take up more space (and be slower) than just adding another word sized variable to memory. However, when your code has to support multiple architectures with different endiannesses (whew, what a word), unions can be handy. I tend to prefer using an endian utility library (to functions), but some people like unions.

Memory mapped hardware registers are also commonly accessed with unions. Bit fields in C (don't use them, they're mean) can be passed around as words using unions.

Nathon
A: 

As often mentioned before: unions save memory. But this is not the only difference. Stucts are made to save ALL given sub-types while unions are made to save EXACTLY ONE of the given sub-types. So if you want to store either an integer or a float then a union is probably the thing you need ( but you need to remember somewhere else which kind of number you have saved ). If you want to store both, then you need a struct.

Baju
A: 

borrowing from the quote you posted "...any of one of several types..." of the union members at a time. That is exactly what union is; while struct members can all be assigned and accessed at a time.

union makes more sense in doing some system level(os) programs like process communications/concurrency handling.

deepseefan
+1  A: 

unions have two dominant uses:

First is to provide a variant type, as you have outlined. In contrast to the struct approach, there is one unit of memory shared between all members in the union. If memory isn't an issue, a struct will also serve this function.

I typically embed the union in the struct - the struct ensures that type and data are stored together, and the union means there is exactly one value being stored.

struct any_tag {
    utype_t utype;
    union {
        int ival;
        float fval;
        char *sval;
    } u;
} data;

Second, a union has great use for low level access to raw data - reinterpreting one type as another. The purpose I've used this for is reading and writing binary encoded data.

float ConvertByteOrderedBufferTo32bitFloat( char* input ) {
union {
    float f;
    unsigned char buf[4];
} data;

#if WORDS_BIGENDIAN == 1
data.buf[0] = input[0];
data.buf[1] = input[1];
data.buf[2] = input[2];
data.buf[3] = input[3];
#else
data.buf[0] = input[3];
data.buf[1] = input[2];
data.buf[2] = input[1];
data.buf[3] = input[0];
#endif

return dat1.f;
}

Here, you can write to the individual bytes, depending on platform endianness, then interpret those 4 raw char bytes as a IEEE float. Casting that char array to float would not have the same result.

jmanning2k
A: 

Unions are tricky. For years, I couldn't figure them out, then I started doing things with network protocols, and someone showed me the light. Say you have a header, and then after the header, there are various different types of packets, something like:

| type (4 bytes) | uid (8 bytes) | payload length (2 bytes) | Payload (variablelength) |

And then there would be various types of packet payloads... For the sake of argument, there could be hello, goodbye, and message packets...

Well, you can build a nested set of structs/unions that can exactly represent a packet in that protocol like so...

struct packet {
  uint type;
  char unique_id [8];
  ushort payload_length;
  union payload {

    struct hello {
      ushort version;
      uint status;
    };

    struct goodbye {
      char reason[20];
      uint status;
    };

    struct message {
      char message[100];
    };

  };
};

Inevitably, you get this protocol from the Operating System through a read() call, and it's just a jumble of bytes. But if you are careful with your structure definition, and all the types are the right size, you can simply make a pointer to the struct, point it at your buffer filled with random data, and...

char buf[100];
packet *pkt;
read(outsideworld,&buf,1000);
pkt = (struct packet *)&buf;

and reading your packets is as simple as...

switch(pkt->type){

  case PACKET_MESSAGE:
    printf("message = %s\n",
           pkt->payload.message.message);
    break;

  case PACKET_HELLO:
    printf("hello! version = %d status = %d\n",
           pkt->payload.hello.version,
           pkt->payload.hello.status);
    break;
  case PACKET_GOODBYE:
    printf("goodbye! reason = %s status = %d\n",
           pkt->payload.goodbye.reason,
           pkt->payload.goodbye.status);
    break;
}

No grovelling around, counting bytes, etc... You can nest this as deeply as you want (make a union for ip addresses, that gives you the whole thing as an unsigned int, or the individual bytes so it's easier to print 192.168.0.1 out of it).

The unions don't slow down your code, because it all just gets translated into offsets in machine code.

Dav3xor