views:

303

answers:

2

I'm building a tool that transfers very large streaming data sets (possibly on the order of terabytes in a single stream; routinely in the tens of gigabytes) from one server to another. The client portion of the tool will read blocks from the source disk, and send them over the network. The server side will read these blocks off the network and write them to a file on the server disk.

Right now I'm trying to decide which transport to use. Options are raw TCP, and HTTP.

I really, REALLY want to be able to use HTTP. The HttpListener (or WCF if I want to go that route) make it easy to plug in to the HTTP Server API (http.sys), and I can get things like authentication and SSL for free. The problem right now is performance.

I wrote a simple test harness that sends 128K blocks of NULL bytes using the BeginWrite/EndWrite async I/O idiom, with async BeginRead/EndRead on the server side. I've modified this test harness so I can do this with either HTTP PUT operations via HttpWebRequest/HttpListener, or plain old socket writes using TcpClient/TcpListener. To rule out issues with network cards or network pathways, both the client and server are on one machine and communicate over localhost.

On my 12-core Windows 2008 R2 test server, the TCP version of this test harness can push bytes at 450MB/s, with minimal CPU usage. On the same box, the HTTP version of the test harness runs between 130MB/s and 200MB/s depending upon how I tweak it.

In both cases CPU usage is low, and the vast majority of what CPU usage there is is kernel time, so I'm pretty sure my usage of C# and the .NET runtime is not the bottleneck. The box has two 6-core Xeon X5650 processors, 24GB of single-ranked DDR3 RAM, and is used exclusively by me for my own performance testing.

I already know about HTTP client tweaks like ServicePointManager.MaxServicePointIdleTime, ServicePointManager.DefaultConnectionLimit, ServicePointManager.Expect100Continue, and HttpWebRequest.AllowWriteStreamBuffering.

Does anyone have any ideas for how I can get HTTP.sys performance beyond 200MB/s? Has anyone seen it perform this well on any environment?

UPDATE:

Here's a bit more detail on the performance I'm seeing with TcpListener vs HttpListener:

First, I wrote a TcpClient/TcpListener test. On my test box that was able to push 450MB/s.

Then using reflector I figured out how to get the raw Socket object underlying HttpWebRequest, and modified my HTTP client test to use that. Still no joy; barely 200MB/s.

My current theory is that http.sys is optimized for the typical IIS use case, which is lots of concurrent small requests, and lots of concurrent and possibly large responses. I hypothesize that in order to achieve this optimization, MSFT had to do so at the expense of what I'm trying to accomplish, which is very high throughput on a single very large request, with a very small response.

For what it's worth, I also tried up to 32 concurrent HTTP PUT operations to see if it could scale out, but there was still no joy; about 200MB/s.

Interestingly, on my development workstation, which is a quad-core Xeon Precision T7400 running 64-bit Windows 7, my TcpClient implementation is about 200MB/s, and the HTTP version is also about 200MB/s. Once I take it to a higher-end server-class machine running Server 2008 R2, the TcpClient code gets up to 450MB/s, while the HTTP.sys code stays around 200.

At this point I've sadly concluded that HTTP.sys is not the right tool for the job I need done, and will have to continue to use the hand-rolled socket protocol we've been using all along.

+1  A: 

I can't see too much of interest except for this Tech Note. It might be worth having a fiddle with MaxBytesPerSend

spender
Yeah, I'd actually seen that. The reason I didn't meddle with `MaxBytesPerSend` was that in my case, HTTP.sys isn't doing the sending; I'm doing an HTTP PUT with a large payload, which HTTP.sys receives and streams to my HttpLisener-based server.Also, a larger TCP window is supposed to benefit links with high bandwidth combined with high latency; since my test is being done over the loopback interface, latency is asymptotic to zero.Thanks for the suggestion though. Keep 'em coming!
anelson
A: 

If you're going to send files over the LAN then UDP is the way to go, because TCP's overhead is a waste in that case. TCP provides rate limiting to avoid too many lost packets, whereas with UDP the application has to sort that out by itself. NFS would do the job, were it not that you're stuck with windows; but I'm sure there must be ready made UDP stuff. Also use the tool "iperf" (available on linux, probably also windows) to benchmark the network link irrespective of the protocol. Some network cards are plain crap and rely on the CPU too much, which will limit your speed to 200mbit. You want a proper network card with its own processors (don't know the exact terms to put this).

Andreas
Thanks for the suggestion, but for various reasons I'm stuck with HTTP.I'm pretty familiar with the overhead introduced by TCP, but that alone does not account for the performance penalty I'm paying here. Using raw TCP I was able to move 450MB/s, compared to barely over 200MB/s with HTTP.sys, so clearly there's additional overhead specific to the HTTP implementation.I'm using an PCI-Express x4 Intel PRO/1000 dual-port GigE NIC with TCP offload, but that's not even relevant since the test I'm running is over the loopback interface.
anelson
I find it strange that you would be stuck with HTTP, something like TFTP would be much better. How about experimenting with some command line HTTP implementations, eg. curl plus some trivial HTTP server in C? From what you're saying it seems obvious that it's a software limitation, so try some other implementations. You probably want a server that mmaps the files?
Andreas