views:

752

answers:

5

I'm learning C# asynchronous socket programming, and I've learned that it's a good idea to reuse byte buffers in some sort of pool, and then just check one out as needed when receiving data from a socket.

However, I have seen two different methods of doing a byte array pool: one used a simple queue system, and just added/removed them from the queue as needed. If one was requested and there were no more left in the queue, a new byte array is created.

The other method that I've seen uses one big byte array for the entire program. The idea of a queue still applies, but instead it's a queue of integers which determine the slice (offset) of the byte array to use. If one was requested and there were no more left in the queue, the array must be resized.

Which one of these is a better solution for a highly scalable server? My instinct is it would be cheaper to just use many byte arrays because I'd imagine resizing the array as needed (even if we allocate it in large chunks) would be pretty costly, especially when it gets big. Using multiple arrays seems more intuitive too - is there some advantage to using one massive array that I'm not thinking of?

+4  A: 

You are correct in your gut feeling. Every time you need to make the array bigger, you will be recreating the array and copying the existing bytes over. Since we are talking about bytes here, the size of the array may get large very quickly. So, you will be asking for a contiguous piece of memory each time, which, depending on how your program uses memory, might or might not be viable. This will also in effect, become a virtual pool, so to speak. A pool by definition has a set of multiple items that are managed and shared by various clients.

The one array solution is also way more complex to implement. The good thing is that a one array solution allows you to give variable-sized chunks out, but this comes at the cost of essentially reimplementing malloc: dealing with fragmentation, etc, etc, which you shouldn't get into.

A multiple array solution allows you to initialize a pool with N amount of buffers and easily manage them in a straightforward fashion. Definitely the approach I'd recommend.

siz
+2  A: 

I wouldn't suggest the resizing option. Start simple and work your way up. A queue of byte buffers which gets a new one added to the end when it is exhausted would be a good start. You will probably have to pay attention to threading issues, so my advice would be to use somebody else's thread-safe queue implementation.

Next you can take a look at the more complex "pointers" into a big byte array chunk, except my advice would be to have a queue of 4k/16k (some power of two multiple of the page size) blocks that you index into, and when it is full you add another big chunk to the queue. Actually, I don't recommend this at all due to the complexity and the dubious gain in performance.

Start simple, work your way up. Pool of buffers, make it thread safe, see if you need anything more.

sixlettervariables
+2  A: 

One more vote for multiple buffers, but with the addition that since you're doing things asynchronously you need to make sure your queue is threadsafe. The default Queue<T> collection is definitely not threadsafe.

SO user and MS employee JaredPar has a good threadsafe queue implementation here:
http://blogs.msdn.com/jaredpar/archive/2009/02/16/a-more-usable-thread-safe-collection.aspx

Joel Coehoorn
Good call, with the requisite warning not to do things like "if (queue.Count > 0) { ... dequeue ... }".
sixlettervariables
Read the link: allowing that specific scenario is one of the design goals of the ThreadsafeQueue implementation presented.
Joel Coehoorn
+1  A: 

If you use the single buffer you need a strategy of how fast it should grow when needed. If you grow it by small increments you may have to do it often and copy all the data often. If you grow it by large increments (like the next size is 1,5 times the previous one) you risk to face a situation when you get "Out of memory" simply trying to grow the buffer. It's a lose-lose choice for a scalable system. This is why reusing small buffers is preferrable.

sharptooth
+1  A: 

With a garbage collection heap, you should always favor small, right-sized buffers that have a short life-time. The .NET heap allocator is very fast, generation #0 collections are very cheap.

When you keep a static buffer around, you'll use up system resources for the life of the program. The worst case scenario is when it gets big enough to get moved in the Large Object Heap where it will be a permanent obstacle that cannot be moved.

Hans Passant