ansaurus

Question

Answer 1

+3 A:

It's hard to say what the best solution is without knowing the exact format(s) of the data. Have you considered using unions?

codebolt 2009-05-12 12:03:44

Thanks! A very good idea, though I'd need a struct or even a couple of them anyway.

frgtn 2009-05-12 12:09:44

I think he implied structs and unions used together. But there could still be problems relating to structure holes.

Vulcan Eager 2009-05-12 12:23:56

Answer 2

+8 A:

You need to use structs and or unions. You'll need to make sure your data is properly packed on both sides of the connection and you may want to translate to and from network byte order on each end if there is any chance that either side of the connection could be running with a different endianess.

As an example:

#pragma pack(push)  /* push current alignment to stack */
#pragma pack(1)     /* set alignment to 1 byte boundary */
typedef struct {
    unsigned int    packetID;  // identifies packet in one direction
    unsigned int    data_length;
    char         receipt_flag;  // indicates to ack packet or keep sending packet till acked
    char         data[]; // this is typically ascii string data w/ \n terminated fields but could also be binary
} tPacketBuffer ;
#pragma pack(pop)   /* restore original alignment from stack */

and then when assigning:

packetBuffer.packetID = htonl(123456);

and then when receiving:

packetBuffer.packetID = ntohl(packetBuffer.packetID);

Here are some discussions of Endianness and Alignment and Structure Packing

If you don't pack the structure it'll end up aligned to word boundaries and the internal layout of the structure and it's size will be incorrect.

Robert S. Barnes 2009-05-12 12:04:23

Be aware that on some processors like ARM you may get a data alignment exception if you tried to access data_length in the example.

Steven 2009-05-12 12:13:30

@Steven good point.

Robert S. Barnes 2009-05-12 12:27:20

To fix the problem Steven mentioned about data alignment, you'll need to add char pad[3]; after receipt_flag.

zooropa 2009-05-12 16:51:52

I just changed the order of the fields in the struct to solve the alignment issue rather than add padding.

Robert S. Barnes 2009-05-12 17:05:13

Answer 3

+3 A:

I've done this innumerable times before: it's a very common scenario. There's a number of things which I virtually always do.

Don't worry too much about making it the most efficient thing available.

If we do wind up spending a lot of time packing and unpacking packets, then we can always change it to be more efficient. Whilst I've not encountered a case where I've had to as yet, I've not been implementing network routers!

Whilst using structs/unions is the most efficient approach in term of runtime, it comes with a number of complications: convincing your compiler to pack the structs/unions to match the octet structure of the packets you need, work to avoid alignment and endianness issues, and a lack of safety since there is no or little opportunity to do sanity checks on debug builds.

I often wind up with an architecture including the following kinds of things:

A packet base class. Any common data fields are accessible (but not modifiable). If the data isn't stored in a packed format, then there's a virtual function which will produce a packed packet.
A number of presentation classes for specific packet types, derived from common packet type. If we're using a packing function, then each presentation class must implement it.
Anything which can be inferred from the specific type of the presentation class (i.e. a packet type id from a common data field), is dealt with as part of initialisation and is otherwise unmodifiable.
Each presentation class can be constructed from an unpacked packet, or will gracefully fail if the packet data is invalid for the that type. This can then be wrapped up in a factory for convenience.
If we don't have RTTI available, we can get "poor-man's RTTI" using the packet id to determine which specific presentation class an object really is.

In all of this, it's possible (even if just for debug builds) to verify that each field which is modifiable is being set to a sane value. Whilst it might seem like a lot of work, it makes it very difficult to have an invalidly formatted packet, a pre-packed packets contents can be easilly checked by eye using a debugger (since it's all in normal platform-native format variables).

If we do have to implement a more efficient storage scheme, that too can be wrapped in this abstraction with little additional performance cost.

Wuggy 2009-05-12 20:22:20

This type of design works really well for data streams that are encoded in ASN.1. Just implemented something like this myself to parse LDAP

Matt H 2009-08-26 23:23:12

Answer 4

+1 A:

I agree with Wuggy. You can also use code generation to do this. Use a simple data-definition file to define all your packet types, then run a python script over it to generate prototype structures and serialiation/unserialization functions for each one.

sean riley 2009-05-12 23:30:52

Answer 5

+1 A:

This is an "out-of-the-box" solution, but I'd suggest to take a look at the Python construct library.

Construct is a python library for parsing and building of data structures (binary or textual). It is based on the concept of defining data structures in a declarative manner, rather than procedural code: more complex constructs are composed of a hierarchy of simpler ones. It's the first library that makes parsing fun, instead of the usual headache it is today.

construct is very robust and powerful, and just reading the tutorial will help you understand the problem better. The author also has plans for auto-generating C code from definitions, so it's definitely worth the effort to read about.

Eli Bendersky 2009-05-15 12:30:59

ansaurus

tags:

views:

answers:

How to interpret binary data in C++?

related questions