views:

89

answers:

3

In beej's guide to networking there is a section of marshalling or packing data for Serialization where he describes various functions for packing and unpacking data (int,float,double ..etc).

It is easier to use union(similar can be defined for float and double) as defined below and transmit integer.pack as packed version of integer.i, rather than pack and unpack functions.

union _integer{
  char pack[4];
  int i;
}integer;
  1. Can some one shed some light on why union is a bad choice?

  2. Is there any better method of packing data?

+1  A: 

Why not just do a reinterpret_cast to a char* or a memcpy into a char buffer? They're basically the same thing and less confusing.

Your idea would work, so go for it if you want, but I find that clean code is happy code. The easier it is to understand my work, the less likely it is that someone (like my future self) will break it.

Also note that only POD (plain old data) types can be placed in a union, which puts some limitations on the union approach that aren't there in a more intuitive one.

thebretness
+3  A: 

Different computers may lay the data out differently. The classic issue is endianess (in your example, whether pack[0] has the MSB or LSB). Using a union like this ties the data to the specific representation on the computer that generated it.

If you want to see other ways to marshall data, check out the Boost serialization and Google protobuf.

R Samuel Klatchko
Good call on the serialization libraries. The endianness issues could come up, but only if you're building your code for two drastically different processors (x86 and PowerPC maybe), which doesn't happen a lot unless you do embedded development of some sort. On the other hand, it's worth keeping in mind.
thebretness
There are also size issues; `int` may be anywhere from 16 to 64 bits wide depending on the architecture.
John Bode
+1  A: 

The union trick is not guaranteed to work, although it usually does. It's perfectly valid (according to the standard) for you to set the char data, and then read 0s when you attempt to read the int, or vice-versa. union was designed to be a memory micro-optimization, not a replacement for casting.

At this point, usually you either wrap up the conversion in a handy object or use reinterpret_cast. Slightly bulky, or ugly... but neither of those are necessarily bad things when you're packing data.

jkerian