views:

432

answers:

6

If you have the following class as a network packet payload:

class Payload { char field0; int field1; char field2; int field3; };

Does using a class like Payload leave the recipient of the data susceptible to alignment issues when receiving the data over a socket? I would think that the class would either need to be reordered or add padding to ensure alignment.

Either reorder:

class Payload
{
    int  field1;
    int  field3;
    char field0;
    char field2;
};

or add padding:

class Payload
{
    char  field0;
    char  pad[3];
    int   field1;
    char  field2;
    char  pad[3];
    int   field3; 
};

If reordering doesn't make sense for some reason, I would think adding the padding would be preferred since it would avoid alignment issues even though it would increase the size of the class.

What is your experience with such alignment issues in network data?

+8  A: 

Correct, blindly ignoring alignment can cause problems. Even on the same operating system if 2 components were compiled with different compilers or different compiler versions.

It is better to...
1) Pass your data through some sort of serialization process.
2) Or pass each of your primitives individually, while still paying attention to byte ordering == Endianness

A good place to start would be Boost Serialization.

Brian R. Bondy
+1: Proper serialization beats messing around with this.
S.Lott
+1 Serialization is the right answer.
RBerteig
+2  A: 

My experience is that the following approaches are to be preferred (in order of preference):

  1. Use a high level framework like Tibco, CORBA, DCOM or whatever that will manage all these issues for you.

  2. Write your own libraries on both sides of the connection that are are aware of packing, byte order and other issues.

  3. Communicate only using string data.

Trying to send raw binary data without any mediation will almost certainly cause lots of problems.

anon
In very old versions of TIBCO (e.g. back when it was still called Teknekron; TIB = Teknekron Information Bus) running on SPARC hardware, one had to memcpy doubles out of received messages and into properly-aligned storage avoid SIGBUS.
Bklyn
+1  A: 

You practically can't use a class or structure for this if you want any sort of portability. In your example, the ints may be 32-bit or 64-bit depending on your system. You're most likely using a little endian machine, but the older Apple macs are big endian. The compiler is free to pad as it likes too.

In general you'll need a method that writes each field to the buffer a byte at a time, after ensuring you get the byte order right with n2hll, n2hl or n2hs.

Andrew Johnson
+4  A: 

You look into Google protocol buffers, or Boost::serialize like another poster said.

If you want to roll your own, please do it right.

If you use types from stdint.h (ie: uint32_t, int8_t, etc.), and make sure every variable has "native alignment" (meaning it's address is divisible evenly by it's size (int8_ts are anywhere, uint16_ts are on even addresses, uint32_ts are on addresses divisble by 4), you won't have to worry about alignment or packing.

At a previous job we had all structures sent over our databus (ethernet or CANbus or byteflight or serial ports) defined in XML. There was a parser that would validate alignment on the variables within the structures (alerting you if someone wrote bad XML), and then generate header files for various platforms and languages to send and receive the structures. This worked really well for us, we never had to worry about hand-writing code to do message parsing or packing, and it was guaranteed that all platforms wouldn't have stupid little coding errors. Some of our datalink layers were pretty bandwidth constrained, so we implemented things like bitfields, with the parser generating the proper code for each platform. We also had enumerations, which was very nice (you'd be surprised how easy it is for a human to screw up coding bitfields on enumerations by hand).

Unless you need to worry about it running on 8051s and HC11s with C, or over data link layers that are very bandwidth constrained, you are not going to come up with something better than protocol buffers, you'll just spend a lot of time trying to be on par with them.

KeyserSoze
+3  A: 

We use packed structures that are overlaid directly over the binary packet in memory today and I am rueing the day that I decided to do that. The only way that we have gotten this to work is by:

  1. carefully defining bit-width specific types based on the compilation environment (typedef unsigned int uint32_t)
  2. inserting the appropriate compiler-specific pragmas in to specify tight packing of structure members
  3. requiring that everything is in one byte order (use network or big-endian ordering)
  4. carefully writing both the server and client code

If you are just starting out, I would advise you to skip the whole mess of trying to represent what's on the wire with structures. Just serialize each primitive element separately. If you choose not to use an existing library like Boost Serialize or a middleware like TibCo, then save yourself a lot of headache by writing an abstraction around a binary buffer that hides the details of your serialization method. Aim for an interface like:

class ByteBuffer {
public:
    ByteBuffer(uint8_t *bytes, size_t numBytes) {
        buffer_.assign(&bytes[0], &bytes[numBytes]);
    }
    void encode8Bits(uint8_t n);
    void encode16Bits(uint16_t n);
    //...
    void overwrite8BitsAt(unsigned offset, uint8_t n);
    void overwrite16BitsAt(unsigned offset, uint16_t n);
    //...
    void encodeString(std::string const& s);
    void encodeString(std::wstring const& s);

    uint8_t decode8BitsFrom(unsigned offset) const;
    uint16_t decode16BitsFrom(unsigned offset) const;
    //...
private:
    std::vector<uint8_t> buffer_;
};

The each of your packet classes would have a method to serialize to a ByteBuffer or be deserialized from a ByteBuffer and offset. This is one of those things that I absolutely wish that I could go back in time and correct. I cannot count the number of times that I have spent time debugging an issue that was caused by forgetting to swap bytes or not packing a struct.

The other trap to avoid is using a union to represent bytes or memcpying to an unsigned char buffer to extract bytes. If you always use Big-Endian on the wire, then you can use simple code to write the bytes to the buffer and not worry about the htonl stuff:

void ByteBuffer::encode8Bits(uint8_t n) {
    buffer_.push_back(n);
}
void ByteBuffer::encode16Bits(uint16_t n) {
    encode8Bits(uint8_t((n & 0xff00) >> 8));
    encode8Bits(uint8_t((n & 0x00ff)     ));
}
void ByteBuffer::encode32Bits(uint32_t n) {
    encode16Bits(uint16_t((n & 0xffff0000) >> 16));
    encode16Bits(uint16_t((n & 0x0000ffff)      ));
}
void ByteBuffer::encode64Bits(uint64_t n) {
    encode32Bits(uint32_t((n & 0xffffffff00000000) >> 32));
    encode32Bits(uint32_t((n & 0x00000000ffffffff)      ));
}

This remains nicely platform agnostic since the numerical representation is always logically Big-Endian. This code also lends itself very nicely to using templates based on the size of the primitive type (think encode<sizeof(val)>((unsigned char const*)&val))... not so pretty, but very, very easy to write and maintain.

D.Shawley
+1  A: 

If you don't have natural alignment in the structures, compilers will usually insert padding so that alignment is proper. If, however, you use pragmas to "pack" the structures (remove the padding), there can be very harmful side affects. On PowerPCs, non-aligned floats generate an exception. If you're working on an embedded system that doesn't handle that exception, you'll get a reset. If there is a routine to handle that interrupt, it can DRASTICALLY slow down your code, because it'll use a software routine to work around the misalignment, which will silently cripple your performance.

KeyserSoze