views:

270

answers:

5

Hi all,

I am about to write a message protocol going over a TCP stream. The receiver needs to know where the message boundaries are.

I can either send 1) fixed length messages, 2) size fields so the receiver knows how big the message is, or 3) a unique message terminator (I guess this can't be used anywhere else in the message).

I won't use #1 for efficiency reasons.

I like #2 but is it possible for the stream to get out of sync?

I don't like idea #3 because it means receiver can't know the size of the message ahead of time and also requires that the terminator doesn't appear elsewhere in the message.

With #2, if it's possible to get out of sync, can I add a terminator or am I guaranteed to never get out of sync as long as the sender program is correct in what it sends? Is it necessary to do #2 AND #3?

Please let me know.

Thanks, jbu

+1  A: 

You are using TCP, the packet delivery is reliable. So the connection either drops, timeouts or you will read the whole message. So option #2 is ok.

Marco Mustapic
+1  A: 

Depending on the level at which your working, #2 may actually not have an issues with going out of sync (TCP has sequence numbering in the packets, and should reassemble the stream in correct order for you if it arrive).

Thus, #2 is probably your best bet. In addition, knowing the message size early on in the transmission will make it easier to allocate memory on the receiving end.

Zxaos
A: 

If you are developing both the transmit and receive code from scratch, it wouldn't hurt to use both length headers and delimiters. This would provide robustness and error detection. Consider the case where you just use #2. If you write a length field of N to the TCP stream, but end up sending a message which is of a size different from N, the receiving end wouldn't know any better and end up confused.

If you use both #2 and #3, while not foolproof, the receiver can have a greater degree of confidence that it received the message correctly if it encounters the delimiter after consuming N bytes from the TCP stream. You can also safely use the delimiter inside your message.

Take a look at HTTP Chunked Transfer Coding for a real world example of using both #2 and #3.

sigjuice
A: 

I agree with sigjuice. If you have a size field, it's not necessary to add and end-of-message delimiter -- however, it's a good idea. Having both makes things much more robust and easier to debug.

Consider using the standard netstring format, which includes both a size field and also a end-of-string character. Because it has a size field, it's OK for the end-of-string character to be used inside the message.

David Cary
A: 

There is a fourth alternative: a self-describing protocol such as XML.

EJP