tags:

views:

1611

answers:

8

I'm looking for some union examples, not to understand how union works, hopefully I do, but to see which kind of hack people do with union.

So feel free to share your union hack (with some explanation of course :) )

+5  A: 

Here's a little one I use every day:

struct tagVARIANT {
    union {
        struct __tagVARIANT {
            VARTYPE vt;
            WORD    wReserved1;
            WORD    wReserved2;
            WORD    wReserved3;
            union {
                LONG          lVal;         /* VT_I4                */
                BYTE          bVal;         /* VT_UI1               */
                SHORT         iVal;         /* VT_I2                */
                FLOAT         fltVal;       /* VT_R4                */
                DOUBLE        dblVal;       /* VT_R8                */
                VARIANT_BOOL  boolVal;      /* VT_BOOL              */
                _VARIANT_BOOL bool;         /* (obsolete)           */
                SCODE         scode;        /* VT_ERROR             */
                CY            cyVal;        /* VT_CY                */
                DATE          date;         /* VT_DATE              */
                BSTR          bstrVal;      /* VT_BSTR              */
                IUnknown *    punkVal;      /* VT_UNKNOWN           */
                IDispatch *   pdispVal;     /* VT_DISPATCH          */
                SAFEARRAY *   parray;       /* VT_ARRAY             */
                BYTE *        pbVal;        /* VT_BYREF|VT_UI1      */
                SHORT *       piVal;        /* VT_BYREF|VT_I2       */
                LONG *        plVal;        /* VT_BYREF|VT_I4       */
                FLOAT *       pfltVal;      /* VT_BYREF|VT_R4       */
                DOUBLE *      pdblVal;      /* VT_BYREF|VT_R8       */
                VARIANT_BOOL *pboolVal;     /* VT_BYREF|VT_BOOL     */
                SCODE *       pscode;       /* VT_BYREF|VT_ERROR    */
                CY *          pcyVal;       /* VT_BYREF|VT_CY       */
                DATE *        pdate;        /* VT_BYREF|VT_DATE     */
                BSTR *        pbstrVal;     /* VT_BYREF|VT_BSTR     */
                IUnknown **   ppunkVal;     /* VT_BYREF|VT_UNKNOWN  */
                IDispatch **  ppdispVal;    /* VT_BYREF|VT_DISPATCH */
                SAFEARRAY **  pparray;      /* VT_BYREF|VT_ARRAY    */
                VARIANT *     pvarVal;      /* VT_BYREF|VT_VARIANT  */
                PVOID         byref;        /* Generic ByRef        */
                CHAR          cVal;         /* VT_I1                */
                USHORT        uiVal;        /* VT_UI2               */
                ULONG         ulVal;        /* VT_UI4               */
                INT           intVal;       /* VT_INT               */
                UINT          uintVal;      /* VT_UINT              */
                DECIMAL *     pdecVal;      /* VT_BYREF|VT_DECIMAL  */
                CHAR *        pcVal;        /* VT_BYREF|VT_I1       */
                USHORT *      puiVal;       /* VT_BYREF|VT_UI2      */
                ULONG *       pulVal;       /* VT_BYREF|VT_UI4      */
                INT *         pintVal;      /* VT_BYREF|VT_INT      */
                UINT *        puintVal;     /* VT_BYREF|VT_UINT     */
            } __VARIANT_NAME_3;
        } __VARIANT_NAME_2;
        DECIMAL decVal;
    } __VARIANT_NAME_1;
};

This is the definition of the OLE automation variant data type. As you can see it has lots of possible types. There are lots of rules around the types you can use in different situations, depending on the capabilities of your intended client code. Not all types are supported by all languages.

The types with VT_BYREF after them are used by languages such as VBScript that pass parameters by reference by default. This means if you have some code that cares about the variant structure details (such as C++) being called by code that doesn't (such as VB), then you have to carefully dereference the variant parameter if required.

The byref types are also used to return values from functions. There is also support for array types using the weirdly misnamed SAFEARRAY type - so difficult to use from C++.

If you have an array of strings, you can pass it to vbscript, but it cannot be used (except to print the size). To actually read the values, the array data needs to be of type VT_BYREF | VT_BSTR.

1800 INFORMATION
OMG, my eyes are bleeding!
paxdiablo
Where's the explanation? :-)
Paul
I thought it was self explanatory :) This is the definition of the OLE automation variant data type. As you can see it has lots of possible types. Not all types are supported by all languages
1800 INFORMATION
The types with VT_BYREF after them are used by languages such as VBScript that pass parameters by reference by default. There are lots of rules around the types you can use in different situations, depending on the capabilities of your intended client code
1800 INFORMATION
The byref types are also used to return values from functions. There is also support for array types using the weirdly misnamed "SAFEARRAY" type - so difficult to use from C++
1800 INFORMATION
If you have an array of strings, you can pass it to vbscript, but it cannot be used (except to print the size). To actually read the values, the array data needs to be of type VT_BYREF | VT_BSTR
1800 INFORMATION
+2  A: 

Coincidentally, I just used one in a Stackoverflow answer here so I could treat a word that was made up of 6 bit fields as two 16 bit unsigned integers.

Years ago, I also used one for (the first) ARM C compiler - the instructions in those days were all 32 bit, but had different layouts depending on the exact instruction. So I had a union to represent an ARM instruction, containing a set of structs which each had the appropriate bitfields for a specific instruction type.

Paul
The ARM thing look nice and ugly :)
claferri
It was :) But (in a cautionary tale about portability of these things) it turned out I got all the bits in the wrong order the first time around...
Paul
+3  A: 

One classic is to represent a value of "unknown" type, as in the core of a simplistic virtual machine:

typedef enum { INTEGER, STRING, REAL, POINTER } Type;

typedef struct
{
  Type type;
  union {
  int integer;
  char *string;
  float real;
  void *pointer;
  } x;
} Value;

Using this you can write code that handles "values" without knowing their exact type, for instance implement a stack and so on.

Since this is in C, the inner union must be given a field name in the outer struct. In C++ you can let the union be anonymous. Picking this name can be hard. I tend to go with something single-lettered, since it is almost never referenced in isolation and thus it is always clear from context what is going on.

Code to set a value to an integer might look like this:

Value value_new_integer(int v)
{
  Value v;
  v.type = INTEGER;
  v.x.integer = v;
  return v;
}

Here I use the fact that structs can be returned directly, and treated almost like values of a primitive type (you can assign structs).

unwind
I guess this one could be usefull!
claferri
+2  A: 
struct InputEvent
{
    enum EventType
    {
     EventKeyPressed,
     EventKeyPressRepeated,
     EventKeyReleased,
     EventMousePressed,
     EventMouseMoved,
     EventMouseReleased
    } Type;
    union
    {
     unsigned int KeyCode;
     struct
     {
      int x;
      int y;
      unsigned int ButtonCode;
     };
    };
};
...
std::vector<InputEvent>   InputQueue;

with the union hack I can simply make a vector of objects. I'm sure this could be made more clean...but it works for me - KISS

qwerty
SDL uses a similar event structure/union.
aib
+1  A: 

Please avoid "hacks" with union, they cause portability headaches (endianness, alignment issues).

  • A legitimate use of union is to store different data types at the same place, preferably with a tag so that you know which type it is. See the example by 1800 INFORMATION.

  • Don't use union to convert between data types, e.g. from an integer to several bytes. Use shift and masking instead for portability.

starblue
Some union (see pthread.h on linux) helps portability don't you think so?
claferri
There is no union in the pthread.h files on my system (all 48 of them).
starblue
look at the nptl source in glibc
claferri
@claferri Please be more specific (which union in which file?).
starblue
A: 
#define DWORD unsigned int
#define WORD  unsigned short
#define BYTE  unsigned char

typedef union _DWORD_PART_ {

   DWORD dwWord;

   struct {
      WORD dwMSB;
      WORD dwLSB;
   }hw;

   struct {

      BYTE byMSB;
      BYTE byMSBL;
      BYTE byLSBH;
      BYTE byLSB;

   } b;

} DWORD_PART;

This is an easy way to access the words parts. (An once you are done, any change in endianness of the platform can also be handled easily)

Alphaneo
Note that this depends on implementation defined behavior. It is only portable to platforms where the implementation defines that it will have the semantics you intend. It is legal for a platform to not overlap the storage of the members of a union, for example, as long as it documented it.
RBerteig
I dont understand when you say this will not work. It works in Windows, Linux, ARM, MacOS, and PPC. Not once I have seen it fail. Anyway, I thought this might be useful, if you dont want then SU.
Alphaneo
+1  A: 

Unions are also commonly used in the lexical analysis and parsing stage of language processors, like compilers and interpreters. Here is one I'm editing right now.

union {
    char c;
    int i;
    string *s;
    double d;
    Expression *e;
    ExpressionList *el;
    fpos_t fp;
}

The union is used to associate semantic values with the tokens of the lexical analyzer and the productions of the parser. This practice is quite common in grammar generators, like yacc, which provides explicit support for it. The union can hold any of its values, but only one of them at the time. For instance, at any one point from the input file you've either read a character constant (stored in c), or an integer (stored in i) or a floating point number (stored in d). The grammar generator provides considerable assistance for determining which of the values is stored at any one time depending on the rule being processed.

Diomidis Spinellis
+1  A: 

We use unions for packed messages at work (C/C++), so we can pass around a structure with an union as a data member, then access the the correct path based on id field in the structure.

Worked find until somebody wrote the structure to a file, now we are limited to the largest data used in the file, because even thought there's a file version, nobody ever changed it....

So while useful for in-memory work, avoid blindly writing them to disk or network.

Simeon Pilgrim