Socket Protocol Fundamentals

views:

254

answers:

+3 Q:

Socket Protocol Fundamentals

Recently, while reading a Socket Programming HOWTO the following section jumped out at me:

But if you plan to reuse your socket for further transfers, you need to realize that there is no "EOT" (End of Transfer) on a socket. I repeat: if a socket send or recv returns after handling 0 bytes, the connection has been broken. If the connection has not been broken, you may wait on a recv forever, because the socket will not tell you that there's nothing more to read (for now). Now if you think about that a bit, you'll come to realize a fundamental truth of sockets: messages must either be fixed length (yuck), or be delimited (shrug), or indicate how long they are (much better), or end by shutting down the connection. The choice is entirely yours, (but some ways are righter than others).

This section highlights 4 possibilities for how a socket "protocol" may be written to pass messages. My question is, what is the preferred method to use for real applications?

Is it generally best to include message size with each message (presumably in a header), as the article more or less asserts? Are there any situations where another method would be preferable?

+3 A:

The common protocols either specify length in the header, or are delimited (like HTTP, for instance).

Keep in mind that this also depends on whether you use TCP or UDP sockets. Since TCP sockets are reliable you can be sure that you get everything you shoved into them. With UDP the story is different and more complex.

Eli Bendersky 2010-03-03 03:49:15

+1, with UDP fixed length is the way to go. If you don't fit everything into one packet, you might not be able to put it back together.

Carl Norum 2010-03-03 03:53:20

Why does that matter, the IP layer won't forward the UDP packet up to your application if it gets munged along the way - missing part of it is the same as missing all of it, right? It's been a long time since I wrote a networking application, I'm afraid.

Carl Norum 2010-03-03 03:58:52

You should have the ability to delete your own comment, if you want...

Justin Ethier 2010-03-03 18:05:16

"Since TCP sockets are reliable you can be sure that you get everything you shoved into them" is a terrible misconception. You can be sure that you receive everything in the right order and that the datastream starts with what you actually intended to be the start, but you can never be sure whether it ended where is was intended to end without using app-level protocol structures to determine that.

Mart Oruaas 2010-04-20 10:02:04

@Mart: sorry but I'm not sure what you mean. If you wrote "abcde" into a TCP socket, you will get it at the receiving end eventually.

Eli Bendersky 2010-04-20 10:13:01

No, there is no such guarantee. You may lose the end (where 'end' may easily start with the first byte sent to the socket) of datastream from a TCP connection. Try to play with TCP in sloppy networks with high RTT and packetloss and you will discover surprising things about the unreliablity of TCP.

Mart Oruaas 2010-04-21 06:58:42

+2 A:

These are indeed our choices with TCP. HTTP, for example, uses a mix of second, third, and forth option (double new-line ends request/response headers, which might contain the Content-Length header or indicate chunked encoding, or it might say Connection: close and not give you the content length but expect you to rely on reading EOF.)

I prefer the third option, i.e. self-describing messages, though fixed-length is plain easy when suitable.

Nikolai N Fetissov 2010-03-03 03:51:27

+1 A:

I do not know if there is a preferred option. In our real-world situation (client-server application), we use the option of sending the total message length as one of the first pieces of data. It is simple and works for both our TCP and UDP implementations. It makes the logic reasonably "simple" when reading data in both situations. With TCP, the amount of code is fairly small (by comparison). The UDP version is a bit (understatement) more complex but still relies on the size that is passed in the initial packet to know when all data has been sent.

Mark Wilkins 2010-03-03 04:11:17

A good choice. The implementation can be vulnerable to buffer overflows when programmers don't test with invalid messages.

Zan Lynx 2010-03-03 04:24:09

+1 A:

If you're designing your own protocol then look at other people's work first; there might already be something similar out there that you could either use 'as is' or repurpose and adjust. For example; ISO-8583 for financial txns, HTTP or POP3 all do things differently but in ways that are proven to work... In fact it's worth looking at these things anyway as you'll learn a lot about how real world protocols are put together.

If you need to write your own protocol then, IMHO, prefer length prefixed messages where possible. They're easy and efficient to parse for the receiver but possibly harder to generate if it is costly to determine the length of the data before you begin sending it.

Len Holgate 2010-03-03 06:56:11

+1 A:

The decision should depend on the data you want to send (what it is, how is it gathered). If the data is fixed length, then fixed length packets will probably be the best. If data can be easily (no escaping needed) split into delimited entities then delimiting may be good. If you know the data size when you start sending the data piece, then len-prefixing may be even better. If the data sent is always single characters, or even single bits (e.g. "on"/"off") then anything different than fixed size one character messages will be too much.

Also think how the protocol may evolve. EOL-delimited strings are good as long as they do not contain EOL characters themselves. Fixed lenght may be good until the data may be extended with some optional parts, etc.

Jacek Konieczny 2010-03-03 18:29:15

ansaurus

tags:

views:

answers:

Socket Protocol Fundamentals

related questions