views:

191

answers:

3

Hi

I'm working on a basic networking protocol in Python, which should be able to transfer both ASCII strings (read: EOL-terminated) and binary data. For the latter to be possible, I chose to create the grammar such that it contains the number of bytes to come which are going to be binary.

For SimpleParse, the grammar would look like this [1] so far:

EOL := [\n]
IDENTIFIER := [a-zA-Z0-9_-]+
SIZE_INTEGER := [1-9]*[0-9]+
ASCII_VALUE := [^\n\0]+, EOL
BINARY_VALUE := .*+
value := (ASCII_VALUE/BINARY_VALUE)

eol_attribute := IDENTIFIER, ':', value
binary_attribute := IDENTIFIER, [\t], SIZE_INTEGER, ':', value
attributes := (eol_attribute/binary_attribute)+ 

command := IDENTIFIER, EOL
command := IDENTIFIER, '{', attributes, '}'

The problem is I don't know how to instruct SimpleParse that the following is going to be a chuck of binary data of SIZE_INTEGER bytes at runtime.

The cause for this is the definition of the terminal BINARY_VALUE which fulfills my needs as it is now, so it cannot be changed.

Thanks

Edit

I suppose the solution would be telling it to stop when it matches the production binary_attribute and let me populate the AST node manually (via socket.recv()), but how to do that?

Edit 2

Base64-encoding or similar is not an option.

[1] I have't tested it, so I don't know if it practically works, it's only for you to get an idea

+4  A: 

If the grammar is as simple as the one you quoted, then perhaps using a parser generator is overkill? You might find that rolling your own recursive parser by hand is simpler and quicker.

DanC
A: 

I strongly recommend you consider using the construct library for parsing the binary data. It also has support for text (ASCII), so when it detects text you can pass that to your SimpleParse-based parser, but the binary data will be parsed with construct. It's very convenient and powerful.

Eli Bendersky
+1  A: 

If you want your application to be portable and reliable I would suggest you pass only standard ASCII characters over the wire.

Different computer architectures have different binary representaions, different word sizes, different character sets. There are three approaches to dealing with this.

FIrst you can ignore the issues and hope you only ever have to implement the protocol on a single paltform.

Two you can go all computer sciency and come up with a "cardinal form" for each possible data type ala CORBA.

You can be practical and use the magic of "sprintf" and "scanf" to translate your data to and from plain ASCII characters when sending data over the network.

I would also suggest that your protocol includes a message length at or near the begining of the message. The commonest bug in home made protocols is the receiving partner expecting more data than was sent and subsequntly waiting forever for data that was never sent.

James Anderson