views:

50

answers:

2

Two questions:

I need to make a server that handles potentially thousands of simultaneous requests for:

  • Hashing of files
  • Compression of files
  • Decompression of files
  • Possibly some file copy / moves as well

I can't control a customer's hardware (RAID configurations, etc) so I assume all I can do is request hundreds of file operations, and allow the OS and disc controller to provide whatever optimizations they can. Correct?

Next question: I would like to maximize use of I/O completion threads (instead of worker threads). The only ones I believe are available to me, via .net 3.5 anyway, are offered via "BeginRead/Write" in:

  • System.IO.Compression.DeflateStream
  • System.IO.Compression.GZipStream
  • System.IO.FileStream
  • System.IO.Stream

Is there something I'm missing that would give me the ability to use an I/O completion thread for hashing files? Does the 7Zip SDK use I/O completion threads?

A: 

First, while .NET is pretty good performance-wise, if very high performance is a basic requirement, I would turn to a native-compiled, unmanaged language like C++. JIT compilation and the other overheads of the CLR are going to slow down the performance of any algorithm written in .NET.

I think that thousands of truly simultaneous requests are going to indicate a highly distributed model; right now, the best server hardware on the market (dual Xeon quad-core hyperthreading CPUs) will only do 32 things at once, and listening for requests to do things, talking to the hardware layer, and other general OS/runtime overhead will take up a couple of those. I would analyze the real traffic you expect this server to handle concurrently, and scale the number of boxes you have working on it to match.

Second, I think that you're talking about when you say "I/O completion threads" are the threads that the asynchronous Begin/End calls use to do their job, instead of threads from the ThreadPool (avoid in really thread-heavy apps) or user-created threads (no problem with these, just watch your thread count). Really, except for a few special cases, a thread is a thread, and exactly where it's spawned doesn't make much difference at the hardware level, so if you really wanted to, spawning worker threads that used the synchronous calls would get you pretty much the same result (but it's generally better to use the tools you have rather than forge new ones).

Now, to your real question. No, there is not an asynchronous model for hashing; if you want to multithread a hashing operation, the thread must be spawned seperately. However, hashing requires a stream or byte buffer, which can be obtained asynchronously using Stream.BeginRead(), and the callback method passed to BeginRead() can perform the hashing in the thread that the asynchronous call spawned.

KeithS
This misses the point of the question; threading and CPU time are not the bottleneck here. I/O completion threads are definitely a different kind of beast, with special support from the kernel for wake-up on completion of I/O operations.
Dan Bryant
A: 

I would recommend looking into the new async programming model in F#. There's an excellent video from MS TechEd 2010 in New Orleans by Luke Hoban on this very topic:

http://www.msteched.com/2010/NorthAmerica/DEV307

http://blogs.msdn.com/b/lukeh/archive/2010/06/13/f-scaling-from-explorative-to-net-component-f-talk-teched-2010.aspx

GregC
If you could influence hardware decisions, think SSD.
GregC