views:

36

answers:

3

I currently work on a multithreaded application where I receive data using the following (simplified) code:

private void BeginReceiveCallback(IAsyncResult ar)
{
bytesReceived = this.Socket.EndReceive(ar);
byte[] receivedData = new byte[bytesReceived];
Array.Copy(buffer, receivedData, bytesReceived);

long protocolLength = BitConverter.ToInt64(receivedData, 0);
string protocol = Encoding.ASCII.GetString(receivedData, 8, (int)protocolLength);
IList<object> sentObjects = 
ParseObjectsFromNetworkStream(receivedData, 8 + protocolLength);

InvokeDataReceived(protocol, sentObjects);
}

I'm experiencing that receivedData contains not only the expected data, but also a lot more. I suspect that this is data sent afterwards that has been mixed in with the previous in the stream.

My question is, what data can I expect to be stored in this buffer. Can it contain data from two different send operations from the client side? In this case, then I suppose I will have to come up with a protocol that can differentiate between the data 'messages' sent from the client side. A simple approach would be to respectively start and end each stream with a specific (unique) byte. Is there a common approach to seperating messages? Furthermore I guess this means that a single receive call might not be enough to get all the data from the client which means I'll have to loop until the end byte was found?

+1  A: 

With TCP/IP, the data is typically considered a stream. You can receive as much as is sent (and you may need to "receive" again in order to get all that was sent). A common situation is that the two end points would have some kind of back-and-forth dialog. That does not sound like the situation that you are describing. A simple of handling what your application is doing is to send the length of data in the first few bytes (e.g., a 4 byte integer). The receiving end would receive 4 bytes to find the expected length and then receive that exact amount.

Mark Wilkins
+1  A: 

you are right, the stream that the socket received can be either it could get truncated (forceful disconnect) or as you have seen it can contain additional data from another send operation.

you can play around with the socket settings to specify immediate send (nagle? i think it's called)

With the work I've done with TCP I have a communcation protocol that's used for binary data and another thats used for ascii, when i'm sending binary data i have a known data lenght that all communication must start with (BlockType (2bytes) BlockLength (4bytes) Data (nBytes))

The receiving end then knows to read the 6bytes to work out the type and length. If i receive less than 6 bytes try until i do and buffer the previous values etc, then once established size is known read until your have read from Data the BlockLength bytes (buffering as required).

If you have a socket disconnect, you need to deal with either resumption or restarting etc.

If I am working with ascii data I use a unique wrapping method around the block of data being send (~~~start~~~) ....DATA.... (~~~end~~~) than just buffer the content into a stringbuilder or similar until you hit (~~~end~~~) and then continue with your operations.

Hope this is of some help.

Paul Farry
Yes - it is called the Nagle algorithm. If it is enabled, then (typically) the underlying stack will "hold" on to data for a few milliseconds before sending the data. If more data is given in that timeframe, it will send all the data in a single send on the wire.
Mark Wilkins
It's important to *not* disable Nagle unless you know *exactly* what you're doing.
Stephen Cleary
+2  A: 

TCP/IP socket connections consist of two independent streams: one incoming and one outgoing.

This is one of the key concepts of TCP/IP that is often missed. From the perspective of the application, TCP/IP does not operate on packets; it operates on streams!

There is no method to send a packet. The API simply does not exist. When you send data, you just place those bytes in the outgoing stream. They are then read from the incoming stream on the other side.

As an example, one side can send 5 bytes and then send another 5 bytes. The receiving side can receive two batches of 5 bytes, or one at a time, or all 10 in a single read...

To split the incoming stream of bytes into messages, you need message framing. One of two solutions is commonly used. The one you suggested is the delimiter solution, where SOT/EOT bytes are used to designate message boundaries. Another one (which I prefer) is the length prefix solution, where the length of the message is prefixed to the message itself.

A more thorough discussion is on my blog, along with sample code for length prefixing.

Stephen Cleary
Thanks for the concise answer. My current solution uses length prefixing, but is not able to distinguis between messages -- yet. What approaches can be taken to gracefully handled parts of the messages not arriving? Is a timeout based approach the best route to pursue?
Qua
Furthermore, what is half the messages - say the middle - gets lost. All the sudden I will be parsing half the next message instead causing a garanteed crash. Can I blindly trust in TCP to ensure this doesn't happen?
Qua
TCP presents you with a reliable stream. It handles retransmission of lost pieces for you, any bytes that you send from one peer will ALWAYS arrive (in the correct order) at the other peer unless the connection is reset (which you'll know about).
Len Holgate
@Qua: The sample code link I posted will accept partial messages and buffer them until the rest of the message arrives. As Len stated, TCP streams are reliable, so you don't have to worry about losing data. I do recommend that you send a "heartbeat" message from each end on a timer; this is [also covered](http://nitoprograms.blogspot.com/2009/05/detection-of-half-open-dropped.html) on my blog.
Stephen Cleary