ansaurus

Question

protocol parsing in c

Answer 1

+1 A:

How much data do you have to read at a time from the socket?

Is it TCP, or UDP: is it stream-oriented or packet-oriented? Do you have a way to know the length of the message before you know the type of the message? If not, could you (e.g. by changing the protocol to ensure that the first field contains/defines the length of the message?

To do what you're doing, you need to read several times from the socket. It might be faster/easier if you could read the whole message into your own memory/RAM, and then decode it there.

ChrisW 2010-06-04 12:34:04

Unfortunately I don't know the size of the packet until I know the TYPE of packet it is. It just doesn't "feel right" that I have to read 8bits at a time from the socket/file to decide what to do with it. But I guess I have to do it somewhere. This is a first attempt for me so it could be that I'm doing nothing wrong and this is the right way to do this. Thanks for the reply.

NomadAlien 2010-06-04 13:03:58

Yah, it would be better if there's a header describing the type and size of the packet, so the parser would have less effort evaluating the packet contents

pcent 2010-06-04 13:42:07

Answer 2

+2 A:

Resist the temptation to optimize prematurely. First make it work, only then should you think about whether it needs optimization. If you do, do so scientifically: benchmark your code and go for the lowest-hanging fruit first, don't rely on gut feel.

Don't forget that your OS will probably be buffering the data itself, whether you are reading from a file or a socket. Still, repeated syscalls are likely to be a bottleneck, so they may well be a straightforward optimization win. At a former workplace we avoided this issue by having our packet header explicitly encode its length (never more than 8k): that way we knew exactly how much to bulk-read into an array, then our own buffering code took over.

crazyscot 2010-06-04 12:39:30

I was thinking the same, but there is no packet size to speak of since all the fields are 8bit fields so each packet have a fixed size EXCEPT the one or 2 packets that allows embedded fields but those embedded fields are also fixed size, but this is the "ugly" part for me where I have to read 8bits at a time and go through several switch/case statements to know what I have.

NomadAlien 2010-06-04 13:06:45

Answer 3

+5 A:

I recommend this approach:

Read all that you can from the file/socket (separate the data communication from the actual protocol)
Pass the data you have read to a procedure for handling data

Pseudo C code (imagine that destinationBuffer is a circular buffer - I believe such data structure is vital in case of applications that need to parse a lot of incoming data):

forever()
{
  // this function adds data to the buffer updating it
  read_all_you_can(destinationBuffer);
  ...
  handle_data(destinationBuffer);
  // the buffer is automatically adjusted in order
  // to reflect how much of the data was processed
}

Generally it is better to read as much as possible in order to have more performance.

Iulian Şerbănoiu 2010-06-04 12:56:18

Damn! I didn't think of a circular buffer! That will solve my problem of reading the data and being stuck with halve a packet!

NomadAlien 2010-06-04 13:08:33

@nomad.alien This is a must when dealing with streams. You can consider a stream almost anything - stdin, a file, a socket, a pipe - generally a file descriptor in unix.

Iulian Şerbănoiu 2010-06-04 14:28:35

Answer 4

A:

Keep in mind that read() is absolutely free to temporarily ignore the size that you gave it and attempt to read in arch aligned boundaries (8/16/32/64). read() is free to do this, so long as it returns to you the exact number (or fewer) bytes that you requested. So, there may already be quite a bit of optimization going on behind the scenes.

If you realize that, you're immediately inclined to structure things so that you read the largest chunks that are possible, which quite often means calling read() (on its own or through some other function that uses it) the least amount of times possible to process a packet.

The sooner you get the data in memory in your own address space, the better - ideally with the fewest possible (direct or indirect) calls to read(). Hint - if you get input from a file descriptor or stream, you are using read().

Tim Post 2010-06-04 14:43:54

Answer 5

A:

PADS is designed to help you solve exactly this kind of problem. It will generate a very efficient C parser for you. You write a declarative description of the packet format and PADS takes it from there.

Norman Ramsey 2010-06-04 14:53:14

Thanks Norman. I'll look into it.

NomadAlien 2010-06-07 13:37:43

ansaurus

tags:

views:

answers:

protocol parsing in c

related questions