views:

350

answers:

4

So, I'm getting this data. From the network socket, or out of a file. I'm cobbling together code that will interpret the data. Read some bytes, check some flags, and some bytes indicate how much data follows. Read in that much data, rinse, repeat.

This task reminds me much to parsing source code. I'm comfy with lex/yacc and antlr, but they're not up to this task. You can't specify bits and raw bytes as tokens (well, maybe you could, but I wouldn't know how), and you can't coax them into "read two bytes, make them into an unsigned 16bit integer, call it n, and then read n bytes.".

Then again, when the spec of the protocol/data format is defined in a systematic manner (not all of them are), there should be a systematic way to read in data that is formatted according to the protocol. Right?

There's gotta be a tool that does that.

+1  A: 

Read up on ASN.1. If you can describe the binary data in its terms, you can then use various available kits. Not for the faint of heart.

bmargulies
A: 

There is certainly nothing stopping you from writing a recursive decent parser, say, for binary data the same way you would hand-tool a text parser. If the format you need to read is not too complicated this is a reasonable way to proceed.

Of course, if you format is very simple you could take a look at Reading binary file defined by a struct and similar question.

I don't know of any parser generators for non-text input, though those are also possible.


In the event that you are not familiar with coding parsers by hand, the canonical SO question is Learning to write a compiler. The Crenshaw tutorial (and in PDF) is a fast read.

dmckee
Well, I *am* essentially writing a parser that way. By hand, that is. But am I the first person on earth to do this?
doppelfish
+3  A: 

You may try to employ Boost.Spirit (v2) which has recently got binary parsing tools, endianness-aware native and mixed parsers

// This is not a complete and useful example, but just illustration that parsing
// of raw binary to real data components is possible
typedef boost::uint8_t byte_t;
byte_t raw[16] = { 0 };
char const* hex = "01010000005839B4C876BEF33F83C0CA";
my_custom_hex_to_bytes(hex, raw, 16);

// parse raw binary stream bytes to 4 separate words
boost::uint32_t word(0);
byte_t* beg = raw;
boost::spirit::qi::parse(beg, beg + 16, boost::spirit::qi::dword, word))

UPDATE: I found similar question, where Joel de Guzman confirms in his answer availability of binary parsers: Can Boost Spirit be used to parse byte stream data?

mloskot
That looks promising. Thank you!
doppelfish
A: 

The Construct parser, written in Python, has done some interesting work in this field. It hasn't had active maintenance for a while though.

Craig McQueen