tags:

views:

262

answers:

9

From a response to some question yesterday, I learned that it is nonportable and unsafe to write into one union member and read the value from another member of a different type, assuming underlying alignment of the members. So after some research I found a written source that repeats this claim and specifies a popular example - using union of int and float to find the binary representation of a float.

So, understanding that this assumption is not safe, I wonder - except for saving memory (duh...) what real use is there to unions?

Note: that is, under Standard C. Clearly, for a specific implementation, the rules are known in advance and can be taken advantage of.

EDIT: the word "unsafe", due to association of recent years, is probably a bad choice of wording, but I think the intention in clear.

EDIT 2: Since this point repeats in the answers - saving memory is a valid argument. I wanted to know if there was something beyond that.

+9  A: 

Yes.

The provide a way of creating generic containers. Though, to get polymorphic behavior you must implement a vtable or type switching yourself...

There are, however, one of those features that you only use when you need them and need rather rarely.

dmckee
OK, I think I understand what you mean. However, since you are required to implement the polymorphism mechanism yourself, what advantage does it have over declaring variables of the different types and accessing them through a pointer which is switched by the mechanism (other than to save memory)?
ysap
Also, was this the intention of the writers of the standard, or is it a smart use case that was developed later?
ysap
Save memory (and it is worth thinking about the characteristics of 'big' machines around the time c was written: main memory measured in kilowords on shared machines), simplified management of complex structures, one character less typing on each access, having the *ability* to use the memory anyway you want to, and don't neglect the ability to do those "undefined" things 'cause not every program needs to be portable.
dmckee
dmckee
+4  A: 

Yes, unions can be nonportable and unsafe but has its uses. For example, it can speed things up by eliminating the need to cast an uint32 to char[4]. This could come in handy if you are trying to route by IP address in SW, but then your processor endian has to be network order. Think of unions as an alternative to casting, with fewer machine instructions. Casting has similar drawbacks.

JackN
As I mentioned in the question, clearly unions can be very useful for a **specific implementation**. You are assuming specific assumptions on the underlying storage. My question is if they are useful under the **standard C** perspective. *dmckee* gives a reasonable argument to that.
ysap
+1 for the explaining unions as a type of casting
Robert
+2  A: 

One way to use unions that I came across it do data hiding.

Say you have a struct that is the buffer

then by allowing union on the struct in some modules you can access the contents of the buffer in different ways or not at all depending on the union declared in that particular module.

EDIT: here's an example

struct X
{
  int a;
};

struct Y
{
  int b;
};

union Public
{
   struct X x;
   struct Y y;
};

here whoever uses union XY can cast XY to either struct X or Y

so given a function:

void foo(Public* arg)
{   
...

you can access both struct X or struct Y

but then you want to limit the access so that user doesn't know about X

the union name stays the same but the struct X part is not available (through header)

void foo(Public* arg)
{
   // Public is still available but struct X is gone, 
   // user can only cast to struct Y

   struct Y* p = (struct Y*)arg;
...
BuggerMe
::blink:: That's *clever*, but...it's also security through obscurity, so probably not a great idea. Also sensitive to packing issues, so implementation dependent.
dmckee
@BuggerMe - I'm not sure I see your point. Can you give an example for the definition of such buffer?
ysap
@dmckee - you approved my suspicion that this usecase IS implementation specific!
ysap
@dmckee, proper encapsulation is not "security through obscurity". Obviously a malicious caller could even do something like `*(char *)rand() = rand();` in C, so there's no security benefit to hiding your implementation. On the other hand, hiding implementation details **does** strongly discourage other people using your code from poking at the internals in ways that will break when you later tweak the implementation.
R..
+4  A: 

Even if unions don't offer much in immediate usefulness (reduced memory usage aside), one advantage of using a union over dumping all of its members into a struct is that it makes the intended semantics clear: only one value (or set of values if it's a union of structs) is valid at any given time. It documents itself better.

The mutual exclusivity of members would be less obvious if you instead made all the union members separate members of a struct. Additionally, you'd still have the same problem of ill-defined behavior if you read a member that wasn't previously written to, but now you need to account for the application's semantics too (did it initialize all unused members to 0? did it leave them as garbage?), so in that sense, why wouldn't you use a union?

jamesdlin
I think that's only partially correct - my understanding is that members of the same type are guarantied to overlay eachother, so no mutex here.
ysap
@ysap: That's true, but it's an implementation detail of unions that is unrelated to my point.
jamesdlin
@jamesdin - well your point is that "due to the unknown arrangement of the members, do assume exclusiveness on access" (or isn't it?). My comment was that this is only partially true.
ysap
@ysap: My point is that a `struct` by itself conveys no semantic information about how its members are supposed to be used relative to each other. I didn't say anything about "unknown arrangements".
jamesdlin
+2  A: 

The question contains a constraint that might disallow a valid answer...

You ask about real usage under the standard, but "real usage" may be allowing a knowledgeable programmer to exploit implementation defined behaviour in ways that the standards committee didn't want to anticipate or enumerate. And I don't mean that the standards committee had a particular behaviour in mind, but that they explicitly wanted to leave the ability there to be exploited in a useful way.

In other words: Unions don't have to be useful for standard defined behaviour to be useful in general, they could simply there to allow someone to exploit the quirks of their target machine without resorting to assembly.

There could be a million useful ways to use them on the various machines available in implementation-defined ways, and zero useful ways to use them in a strictly portable way, but those million implementation-defined usages are reason enough to standardise their existence.

I hope that makes sense.

detly
Yes, it does make sense, thanks. I am not completely sure at this time that I agree with this as answering my question.
ysap
@ysap - it doesn't really, but I think it was more an answer than a comment :)
detly
+3  A: 

Even discounting a specific implementation where the alignment and packing are known, unions can still be useful.

They allow you to store one of many values into a single block of memory, along the lines of:

typedef struct {
    int type;
    union {
        type1 one;
        type2 two;
    }
} unioned_type;

And yes, it is non-portable to expect to be able to store your data into one and read it from two. But if you simply use the type to specify what the underlying variable is, you can easily get at it without having to cast.

In other words:

unioned_type ut;
ut.type = 1;
ut.one = myOne;
// Don't use ut.two here unless you know the underlying details.

is fine assuming you use type to decide that a type1 variable is stored there.

paxdiablo
If I understand your example, it is actually similar to the example given in the reference book I mentioned above. Then, the argument boils down to *save memory*.
ysap
+1 — This is probably the most useful way to use a union within the bounds of the standard.
detly
OK, after you update, I am sure it is actually similar to that example.
ysap
Yes, and saving memory was a very important consideration in the early days. C was sometimes described as a language with all the power of assembler with all the readability of ... assembler :-) The other thing you must understand was that the initial mandate of ANSI C was to mostly codify existing practices, not standardise a new language. In each subsequent standard, ANSI and ISO, they've been very careful not to break existing code unnecessarily.
paxdiablo
It's not just an "early days" consideration — it's important on embedded devices, which is still a massive use-case for C. And even embedded devs might want to upgrade or change their compiler at some point.
detly
@ysap, I don't doubt it's similar to quite a lot of examples. There's a fairly limited number of ways to do it.
paxdiablo
+2  A: 

Using a union for type punning is non-portable (though not particularly less portable than any other method of type punning).

OTOH, a parser, for one example, typically has a union to represent values in expressions. [Edit: I'm replacing the parser example with one I hope is a bit more understandable]:

Let's consider a Windows resource file. You can use it to define resources like menus, dialogs, icons, etc. Something like this:

#define mn1 2

mn1 MENU
{
    MENUITEM "File", -1, MENUBREAK
}

ico1 "junk.ico"

dlg1 DIALOG 100, 0, 0, 100, 100 
BEGIN
    FONT 14, "Times New Roman"
    CAPTION "Test Dialog Box"
    ICON ico1, 700, 20, 20, 20, 20
    TEXT "This is a string", 100, 0, 0, 100, 10
    LTEXT "This is another string", 200, 0, 10, 100, 10
    RTEXT "Yet a third string", 300, 0, 20, 100, 10
    LISTBOX 400, 20, 20, 100, 100
    CHECKBOX "A combobox", 500, 100, 100, 200, 10
    COMBOBOX 600, 100, 210, 200, 100
    DEFPUSHBUTTON "OK", 75, 200, 200, 50, 15
END

Parsing a the MENU gives a menu-definition; parsing the DIALOG gives a dialog definition and so on. In the parser we represent that as a union:

%union { 
        struct control_def {
                char window_text[256];
                int id;
                char *class;
                int x, y, width, height;
                int ctrl_style;
        } ctrl;

        struct menu_item_def { 
                char text[256];
                int identifier;
        } item;

        struct menu_def { 
                int identiifer;
                struct menu_item_def items[256];
        } mnu;

        struct font_def { 
                int size;
                char filename[256];
        } font;

        struct dialog_def { 
                char caption[256];
                int id;
                int x, y, width, height;
                int style;
                struct menu_def *mnu;
                struct control_def ctrls[256];
                struct font_def font;
        } dlg;

        int value;
        char text[256];
};

Then we specify the type that will be produced by parsing a particular type of expression. For example, a font definition in the file becomes a font member of the union:

%type <font> font

Just to clarify, the <font> part refers to the union member that's produced and the second "font" refers to a parser rule that will yield a result of that type. Here's the rule for this particular case:

font: T_FONT T_NUMBER "," T_STRING { 
    $$.size = $2; 
    strcpy($$.filename,$4); 
};

Yes, in theory we could use a struct instead of a union here -- but beyond wasting memory, it just doesn't make sense. A font definition in the file only defines a font. It would make no sense to have it produce a struct that included a menu definition, icon definition, number, string, etc. in addition to the font it actually defines. [end of edit]

Of course, using unions to save memory is rarely very important anymore. While it may generally seem rather trivial now, back when 64 Kb of RAM was a lot, the memory savings meant a lot more.

Jerry Coffin
Sorry, Jerry, I did not understand your example. Could you please give a concrete example for how the union you defined is related to expressions it supposedly represent?
ysap
@ysap: I was afraid of that -- I'll try to write up a *small* grammar to show more about how it works.
Jerry Coffin
@Jerry - thanks for clarifying the usecase. I think I understood the message. You basically use the union as a generic container. Then, the *type* of the object actually occupies the union is being stored in a *tag* (I assume it is the `value` or `text` member/s). So, like other examples in other answers, this is basically an explicit implementation of polymorphism.
ysap
Actually no, it doesn't use a tag -- but the parser generator keeps track of the `<type>` tags you've given, and it'll give an error if you try to mix up members (e.g., since the `font` rule produces a `font` result, it can only assign to the `font` member of the union). Yes, I guess you could view it as vaguely similar to polymorphism, but not really exactly.
Jerry Coffin
+3  A: 

Here is one legitimate portable use of unions:

struct arg {
    enum type t;
    union {
        intmax_t i;
        uintmax_t u;
        long double f;
        void *p;
        void (*fp)(void);
    } v;
};

Coupled with type information in t, struct arg can portably contain any numeric or pointer value. The whole struct is likely to be 16-32 bytes in size, compared to 40-80 bytes if a union had not been used. The difference would be even more extreme if I wanted to keep each possible original numeric type separately (signed char, short, int, long, long long, unsigned char, unsigned short, ...) rather than converting them up to the largest signed/unsigned/floating point type before storing them.

Also, while it is not "portable" to assume anything about the representation of types other than unsigned char, it is permitted by the standard to use a union with unsigned char or cast a pointer to unsigned char * and access arbitrary data object that way. If you write that information to disk, it won't be portable to other systems that used different representations, but it still might be useful at runtime - for example, implementing a hash table to store double values. (Anyone want to correct me if padding bit issues make this technique invalid?) If nothing else, it can be used to implement memcpy (not very useful since the standard library provides you a much better implementation) or (more interestingly) a memswap function which could swap two arbitrary-size objects with bounded temporary space. This has gotten a little outside usage domain of unions now and into unsigned char * cast territory, but it's closely related.

R..
R, the first half of your answer is clear - save memory. What I'm not sure I understand is the 2nd half. How an *unsigned char* member can be used to access other members in a predictable way?
ysap
The values aren't "predictable" without knowing the implementation, but they're implementation-defined. As long as your code doesn't make assumptions about what those values are, but merely uses them internally, you're fine. Another possible application would be making a byte-by-byte comparison function for use with `qsort` when you don't care that the sorting has any relation to the natural numeric ordering of the original type, just that it's well-defined and that the results are reproducible.
R..
A: 

Consider a Hardware control Register with different bit fields. By setting values in these bit fields of registers, we can control different functionality of the register.

By using Union Data type, Either We can modify the entire content of the register or a particular bit field of the register. 

For Ex: Consider a union data type as follows,

/* Data1 Bit Defintion */
typedef union 
{
    struct STRUCT_REG_DATA
    {
        unsigned int    u32_BitField1       :   3;

unsigned int u32_BitField2 : 2; unsigned int u32_BitField3 : 1; unsigned int u32_BitField4 : 2; }st_RegData;

    unsigned int    u32_RegData;

}UNION_REG_DATA;    

To modify the entire Content of the register,

UNION_REG_DATA un_RegData; un_RegData. u32_RegData = 0x77;

To modify the single bit field content( For Ex Bitfield3 ) un_RegData. st_RegData. u32_BitField3 = 1;

Both reflect in same memory. Then this value can be written into the value of hardware control register.
@barati21 - That exactly is the point of the question - you **should not** do that if you want to guaranty portability. The underlying layout of union members in the memory is **not** defined in the standard. It is **implementation defined**.
ysap