views:

121

answers:

5

I have been playing around with trying to implement some protocol decoders, but each time I run into a "simple" problem and I feel the way I am solving the problem is not optimal and there must be a better way to do things. I'm using C. Currently I'm using some canned data and reading it in as a file, but later on it would be via TCP or UDP.

Here's the problem. I'm currently playing with a binary protocol at work. All fields are 8 bits long. The first field(8bits) is the packet type. So I read in the first 8 bits and using a switch/case I call a function to read in the rest of the packet as I then know the size/structure of it. BUT...some of these packets have nested packets inside them, so when I encounter that specific packet I then have to read another 8-16 bytes have another switch/case to see what the next packet type is and on and on. (Luckily the packets are only nested 2 or 3 deep). Only once I have the whole packet decoded can I handle it over to my state machine for processing.

I guess this can be a more general question as well. How much data do you have to read at a time from the socket? As much as possible? As much as what is "similar" in the protocol headers?

So even though this protocol is fairly basic, my code is a whole bunch of switch/case statements and I do a lot of reading from the file/socket which I feel is not optimal. My main aim is to make this decoder as fast as possible. To the more experienced people out there, is this the way to go or is there a better way which I just haven't figured out yet? Any elegant solution to this problem?

+1  A: 

How much data do you have to read at a time from the socket?

Is it TCP, or UDP: is it stream-oriented or packet-oriented? Do you have a way to know the length of the message before you know the type of the message? If not, could you (e.g. by changing the protocol to ensure that the first field contains/defines the length of the message?

To do what you're doing, you need to read several times from the socket. It might be faster/easier if you could read the whole message into your own memory/RAM, and then decode it there.

ChrisW
Unfortunately I don't know the size of the packet until I know the TYPE of packet it is. It just doesn't "feel right" that I have to read 8bits at a time from the socket/file to decide what to do with it. But I guess I have to do it somewhere. This is a first attempt for me so it could be that I'm doing nothing wrong and this is the right way to do this. Thanks for the reply.
NomadAlien
Yah, it would be better if there's a header describing the type and size of the packet, so the parser would have less effort evaluating the packet contents
pcent
+2  A: 

Resist the temptation to optimize prematurely. First make it work, only then should you think about whether it needs optimization. If you do, do so scientifically: benchmark your code and go for the lowest-hanging fruit first, don't rely on gut feel.

Don't forget that your OS will probably be buffering the data itself, whether you are reading from a file or a socket. Still, repeated syscalls are likely to be a bottleneck, so they may well be a straightforward optimization win. At a former workplace we avoided this issue by having our packet header explicitly encode its length (never more than 8k): that way we knew exactly how much to bulk-read into an array, then our own buffering code took over.

crazyscot
I was thinking the same, but there is no packet size to speak of since all the fields are 8bit fields so each packet have a fixed size EXCEPT the one or 2 packets that allows embedded fields but those embedded fields are also fixed size, but this is the "ugly" part for me where I have to read 8bits at a time and go through several switch/case statements to know what I have.
NomadAlien
+5  A: 

I recommend this approach:

  1. Read all that you can from the file/socket (separate the data communication from the actual protocol)
  2. Pass the data you have read to a procedure for handling data

Pseudo C code (imagine that destinationBuffer is a circular buffer - I believe such data structure is vital in case of applications that need to parse a lot of incoming data):

forever()
{
  // this function adds data to the buffer updating it
  read_all_you_can(destinationBuffer);
  ...
  handle_data(destinationBuffer);
  // the buffer is automatically adjusted in order
  // to reflect how much of the data was processed
}

Generally it is better to read as much as possible in order to have more performance.

Iulian Şerbănoiu
Damn! I didn't think of a circular buffer! That will solve my problem of reading the data and being stuck with halve a packet!
NomadAlien
@nomad.alien This is a must when dealing with streams. You can consider a stream almost anything - stdin, a file, a socket, a pipe - generally a file descriptor in unix.
Iulian Şerbănoiu
A: 

Keep in mind that read() is absolutely free to temporarily ignore the size that you gave it and attempt to read in arch aligned boundaries (8/16/32/64). read() is free to do this, so long as it returns to you the exact number (or fewer) bytes that you requested. So, there may already be quite a bit of optimization going on behind the scenes.

If you realize that, you're immediately inclined to structure things so that you read the largest chunks that are possible, which quite often means calling read() (on its own or through some other function that uses it) the least amount of times possible to process a packet.

The sooner you get the data in memory in your own address space, the better - ideally with the fewest possible (direct or indirect) calls to read(). Hint - if you get input from a file descriptor or stream, you are using read().

Tim Post
A: 

PADS is designed to help you solve exactly this kind of problem. It will generate a very efficient C parser for you. You write a declarative description of the packet format and PADS takes it from there.

Norman Ramsey
Thanks Norman. I'll look into it.
NomadAlien