views:

155

answers:

9

Imagine that you and me are sending a quite long sentence (say, 1024000 bytes) through TCP.

If you write a 1024000 bytes sentence to me, you actually use NetworkStream to write those bytes in.

When I receive, should I know in advance the size of the sentence you sent?

If not, how can I check when I should stop the stream.read?

If yes, should the program have facilities that embed the data size in the head of the data? So I receive 4 bytes first to see how many total I should read?

Does .Net have anything to automatically embed the data size in the transfer?

+1  A: 

When I receive, should I know in prior about the size of sentence you sent?

That can be helpful (for things like rendering progress bars), but it's not necessarily required.

If not, how can I check when I should stop the stream.read?

The contents of your stream define this. For example, many messages encode some information that tell you that this message is over (e.g., a null byte to represent the end of a string, or </html> to represent the end of a HTML document).

John Feminella
+1  A: 

There two ways you could do this, one is the way you described - placing the size of the message in the header - and another is to put some sort of terminating marker on the stream. For example, if your message is guaranteed not to have embedded NUL characters, you could terminate with a NUL.

nichromium
A: 

Since TCP is a reliable protocol you could either structure your protocol to indicate the number of bytes coming or use some sort of terminator to indicate the end of transmission. If you were using UDP, which is not guaranteed to be reliable, it would be much more important to either build a protocol that will withstand dropped bytes or indicate how many bytes are expected (and have a retransmission mechanism) since the packet containing the termination may be lost. Maximum data transmission times and timeouts may also be useful, but only if you can determine a reasonable maximum.

tvanfosson
+4  A: 

Neither .NET nor the TCP protocol have anything built in to define the size of the message to come in advance. The TCP protocol only specifies that all data will be transferred to the receiving end point (or at least that the best effort will be employed to do so).

You are solely responsible for defining a way to let the receiver know how much data to read. The details of how you do this are - as others have pointed out - dependent of the nature of what you're transferring: you could send the length first like you mentioned, you could encode special sequences called terminators, you could use predefined data chunks so all messages have the same size, etc.

EDIT

This started out as a comment but there's more to it than fits that limit.

To add NULL to the stream simply means appending a character which has the binary value 0 (not to be confused with the character 0). Depending on the encoding you're using for your transfer (i.e. ASCII, UTF-8, UTF-16 etc) that may translate into sending one or more 0 bytes but if you're using appropriate translation you simply need to put something like \0 in your string. Here's an example:

string textToSend = "This is a NULL Terminated text\0";
byte[] bufferToSend = Encoding.UTF8Encoding.GetBytes(textToSend);

Of course all of the above assumes that all of the rest of the data you're sending does not contain any other NULLs. That means that it's text, and not arbitrary binary data (such as the contents of a file). That's very important! Otherwise you can't use NULL as a message terminator and you have to come up with another scheme.

Miky Dinescu
could you please tell me how to add NULL at the end of the stream?
Jack
Yes. Put aniother way, a TCP socket provides a reliable stream of octets. Any structure or boundaries must be imposed by the sending and receiving applications.
bob quinn
The disadvantage of this is that you can always only request one single byte from the stream if you want to avoid reading beyond the end of the message. This can become pretty slow.
x4u
Reading one byte at a time is a really bad idea and totally unnecessary. In fact you can't even "request" a single byte at one time - that's not how TCP works and certainly not how it's implemented in .NET You always read data in chunks which and then you may iterate over the data that you receive one byte at a time to interpret it
Miky Dinescu
+1  A: 

If you know or can easily find out the total length of the message, I'd suggest to transmit it in advance. If it is impossible or very expensive to determine it you could use something similar to chunked transfer encoding in HTTP.

x4u
+2  A: 

Generally speaking, its better to use a header with the data size than a terminating character. The terminating character method is susceptible to a denial of service attack. I can just keep sending data to your service, and as long as I don't include the terminator, you need to keep processing (and possibly allocating memory) until you crash.

Using a header that contains the total size, if a transmission is too big for you to handle, you can ignore it, or send back an error. If a malicious party tries to send more data than what is declared in the header, you'll notice a corrupt header at the start of the next stream and ignore it.

Joe Doyle
good point.. but that assumes a naive implementation of the receiver. you can - and should - have certain boundaries to how much data you receive and process which can be determined arbitrarily based on the typical sizes of data that the application generally expects
Miky Dinescu
+1  A: 

Main point is that with TCP there is no correspondence between the number and size of the socket writes on the transmission side with the number/size of socket reads on the receiver side.

If the stream of data has some kind of structure to it you'll have to add some kind of meta/wrapper data around the payload.

Anytime I have had to solve this problem I have used some combination of:

a) use a magic number to indicate the start or end of your data msg (or both)

b) use a checksum at the end the msg to verify the contents are correct (I know that TCP performs error checking & retranmission but the checksum is a useful in the case where the receiver picks up an incidental occurrence of the start/end magic number/sequence in the stream)

c) use a length field after the initial magic number (provided the transmitting side knows the length of the data before transmission begins)

Hoever before going diy have a good look at what higher level protocols libs are implemented for the language/platform you are using. NetworkStream? is that Windows API/ MFC or something.

For instance I recently had to setup a client/server system. The client & server functionality was already written in python so simply using python xmlrpclib/server made it completely easy to join the two programs together - literally copy the example and I was done in 30mins. If I'd coded some madey-up protocol myself directly on tcp it would've been 5 days!

tullaman
A: 

My answer would be no. Especially for large data sets. The reason is that sending the size first adds latency in your system.

If you want to send the size first, you need to compute the whole answer before starting to send it.

On the other hand, if you use a termination marker, you can start sending the first bits of data as soon as they are ready, while computing the following data.

Didier Trosset
A: 

You may also want to investigate the BinaryReader/BinaryWriter classes which can be wrapped around any stream, TCP or otherwise.

These support, among other functions, reading/writing strings (in an encoding of your choice) while taking care of including the length of the string too.

Gareth Wilson