views:

214

answers:

2

I have an external device that spits out UDP packets of binary data and software running on an embedded system that needs to read this data stream, parse it and do somethign useful. The binary data gets logged to a file as well. I would like to write a parser that can easily take the input directly from either the UDP stream, or a file, parse the data into a specific format and then direct the output to either a file (e.g. matlab dat file) or to another process that will do some real time processing. Are there any resources that would help me with this and what is the best way to go about this? I think it might make sense to use C++ streams but I'm not familiar with creating custom output streams. Does this seem like a good approach to take or is there a better way to go about it?

Thanks.

+1  A: 

The beauty of binary data is that its is generally of very fixed format. A typical method of parsing it is to declare a structure that maps onto the received packets, and then to just use type-casts to read the fields as structure elements.

The beauty is that this requires no parsing.

you have to be careful about structure packing rules, and endian-ness to make the structure map exactly the same way. Use of the C "offsetof" and "sizeof" macros is useful to emit some debug info to check that your structure is indeed mapping to what you think it is mapping.

Packing rules can typically be altered either by directives (such as #pragma's) or command line options. Endian-ness you are stuck with. If its different from what your embedded system uses, declare all the fields as bytes, or use something like the "ntoh" macro to do the byte swapping.

Matthias Wandel
thanks for the reply. I have been looking into using tape-casts to do the parsing part and I like that. I'm still left with the actual input/output... I'd like to get the data either from the UDP packets in real time, or from a recorded data file and then write the parsed data to either another file, or make it accessible to some other class or process.
HazyBlueDot
A: 

The New Jersey Machine Code Toolkit is a scheme for decoding arbitrary binary patterns. It was originally designed for decoding instruction sets, but it ought to be just fine for decoding message formats. You provide a description of the binary format, it synthesizes code to access the fields of that format (when valid). THus you can refer to message fields using generated function calls rather than think about where the field is or how it is encoded.

Ira Baxter