I'm building a tool that transfers very large streaming data sets (possibly on the order of terabytes in a single stream; routinely in the tens of gigabytes) from one server to another. The client portion of the tool will read blocks from the source disk, and send them over the network. The server side will read these blocks off the network and write them to a file on the server disk.
Right now I'm trying to decide which transport to use. Options are raw TCP, and HTTP.
I really, REALLY want to be able to use HTTP. The HttpListener (or WCF if I want to go that route) make it easy to plug in to the HTTP Server API (http.sys), and I can get things like authentication and SSL for free. The problem right now is performance.
I wrote a simple test harness that sends 128K blocks of NULL bytes using the BeginWrite/EndWrite async I/O idiom, with async BeginRead/EndRead on the server side. I've modified this test harness so I can do this with either HTTP PUT operations via HttpWebRequest
/HttpListener
, or plain old socket writes using TcpClient
/TcpListener
. To rule out issues with network cards or network pathways, both the client and server are on one machine and communicate over localhost.
On my 12-core Windows 2008 R2 test server, the TCP version of this test harness can push bytes at 450MB/s, with minimal CPU usage. On the same box, the HTTP version of the test harness runs between 130MB/s and 200MB/s depending upon how I tweak it.
In both cases CPU usage is low, and the vast majority of what CPU usage there is is kernel time, so I'm pretty sure my usage of C# and the .NET runtime is not the bottleneck. The box has two 6-core Xeon X5650 processors, 24GB of single-ranked DDR3 RAM, and is used exclusively by me for my own performance testing.
I already know about HTTP client tweaks like ServicePointManager.MaxServicePointIdleTime
, ServicePointManager.DefaultConnectionLimit
, ServicePointManager.Expect100Continue
, and HttpWebRequest.AllowWriteStreamBuffering
.
Does anyone have any ideas for how I can get HTTP.sys performance beyond 200MB/s? Has anyone seen it perform this well on any environment?
UPDATE:
Here's a bit more detail on the performance I'm seeing with TcpListener
vs HttpListener
:
First, I wrote a TcpClient/TcpListener test. On my test box that was able to push 450MB/s.
Then using reflector I figured out how to get the raw Socket object underlying HttpWebRequest, and modified my HTTP client test to use that. Still no joy; barely 200MB/s.
My current theory is that http.sys is optimized for the typical IIS use case, which is lots of concurrent small requests, and lots of concurrent and possibly large responses. I hypothesize that in order to achieve this optimization, MSFT had to do so at the expense of what I'm trying to accomplish, which is very high throughput on a single very large request, with a very small response.
For what it's worth, I also tried up to 32 concurrent HTTP PUT operations to see if it could scale out, but there was still no joy; about 200MB/s.
Interestingly, on my development workstation, which is a quad-core Xeon Precision T7400 running 64-bit Windows 7, my TcpClient implementation is about 200MB/s, and the HTTP version is also about 200MB/s. Once I take it to a higher-end server-class machine running Server 2008 R2, the TcpClient code gets up to 450MB/s, while the HTTP.sys code stays around 200.
At this point I've sadly concluded that HTTP.sys is not the right tool for the job I need done, and will have to continue to use the hand-rolled socket protocol we've been using all along.