tags:

views:

107

answers:

4

I am writing a client for a server that typically sends data as strings in 500 or less bytes. However, the data will occasionally exceed that, and a single set of data could contain 200,000 bytes, for all the client knows (on initialization or significant events). However, I would like to not have to have each client running with a 50 MB socket buffer (if it's even possible).

Each set of data is delimited by a null \0 character. What kind of structure should I look at for storing partially sent data sets?

For example, the server may send ABCDEFGHIJKLMNOPQRSTUV\0WXYZ\0123!\0. I would want to process ABCDEFGHIJKLMNOPQRSTUV, WXYZ, and 123! independently. Also, the server could send ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890LOL123HAHATHISISREALLYLONG without the terminating character. I would want that data set stored somewhere for later appending and processing.

Also, I'm using asynchronous socket methods (BeginSend, EndSend, BeginReceive, EndReceive) if that matters.

Currently I'm debating between List<Byte> and StringBuilder. Any comparison of the two for this situation would be very helpful.

+3  A: 

Read the data from the socket into a buffer. When you get the terminating character, turn it into a message and send it on its way to the rest of your code.

Also, remember that TCP is a stream, not a packet. So you should never assume that you will get everything sent at one time in a single read.

As far as buffers go, you should probably only need one per connection at most. I'd probably start with the max size that you reasonably expect to receive, and if that fills, create a new buffer of a larger size - a typical strategy is to double the size when you run out to avoid churning through too many allocations.

If you have multiple incoming connections, you may want to do something like create a pool of buffers, and just return "big" ones to the pool when done with them.

kyoryu
+1  A: 

I would just use a StringBuilder and read in one character at a time, copying and emptying the builder whenever I hit a null terminator.

Ben S
That was what I was thinking, but I worried it would be an efficiency-killer.
Benjamin Manns
It shouldn't be, it was designed to efficiently handle appending arbitrary strings.
Ben S
A: 

I wrote this answer regarding Java sockets but the concept is the same.

http://stackoverflow.com/questions/453609/whats-the-best-way-to-monitor-a-socket-for-new-data-and-then-process-that-data/453951#453951

Spencer Ruport
+1  A: 

You could just use a List<byte> as your buffer, so the .NET framework takes care of automatically expanding it as needed. When you find a null terminator you can use List.RemoveRange() to remove that message from the buffer and pass it to the next layer up.

You'd probably want to add a check and throw an exception if it exceeds a certain length, rather than just wait until the client runs out of memory.

(This is very similar to Ben S's answer, but I think a byte array is a bit more robust than a StringBuilder in the face of encoding issues. Decoding bytes to a string is best done higher up, once you have a complete message.)

Evgeny
Maybe a `MemoryStream`, rather than a `List<byte>`? And `stream.Seek(0, SeekOrigin.Begin)` instead of `RemoveRange`.
Matthew Flaschen
That's a possibility, too. I suppose it depends on how the code is structured. If it processes the message as soon as a null terminator is encountered then seeking to 0 will work fine. However, if it reads all the pending socket data first, then looks for the null terminator, seeking to 0 would lose everything after the first message. I was assuming the latter.
Evgeny