views:

493

answers:

4

Hello, I have written an HTTP proxy that does some stuff that's not relevant here, but it is increasing the client's time-to-serve by a huge amount (600us without proxy vs 60000us with it). I think I have found where the bulk of that time is coming from - between my proxy finishing sending back to the client and the client finishing receiving it. For now, server, proxy and client are running on the same host, using localhost as the addresses.

Once the proxy has finished sending (once it has returned from send() at least), I print the result of gettimeofday which gives an absolute time. When my client has received, it prints the result of gettimeofday. Since they're both on the same host, this should be accurate. All send() calls are with no flags, so they are blocking. The difference between the two is about 40000us.

The proxy's socket on which it listens for client connections is set up with the hints AF_UNSPEC, SOCK_STREAM and AI_PASSIVE. Presumably a socket from accept()ing on that will have the same parameters?

If I'm understanding all this correctly, Apache manages to do everything in 600us (including the equivalent of whatever is causing this 40000us delay). Can anybody suggest what might be causing this? I have tried setting the TCP_NODELAY option (I know I shouldn't, it's just to see if it made a difference) and the delay between finishing sending and finishing receiving went right down, I forget the number but <1000us.

This is all on Ubuntu Linux 2.6.31-19. Thanks for any help

+1  A: 

From e.g. the RedHat documentation:

Applications that require lower latency on every packet sent should be run on sockets with TCP_NODELAY enabled. It can be enabled through the setsockopt command with the sockets API:

int one = 1;
setsockopt(descriptor, SOL_TCP, TCP_NODELAY, &one, sizeof(one));

For this to be used effectively, applications must avoid doing small, logically related buffer writes. Because TCP_NODELAY is enabled, these small writes will make TCP send these multiple buffers as individual packets, which can result in poor overall performance.

As an aside, it is possible to have your cake and eat it too in this case: you can leave Nagle's enabled on your socket, and then when you've written all the data to the socket that you're likely to write for a while, disable Nagle's, send() 0 bytes on the socket, then re-enable Nagle's again. The send() will force any pending data to be sent immediately.
Jeremy Friesner
+4  A: 

You can't really do meaningful performance measurements on a proxy with the client, proxy and origin server on the same host.

Place them all on different hosts on a network. Use real hardware machines for them all, or specialised hardware test systems (e.g. Spirent).

Your methodology makes no sense. Nobody has 600us of latency to their origin server in practice anyway. Running all the tasks on the same host creates contention and a wholly unreaslistic network environment.

MarkR
Many thanks for your answer. I didn't think it would make so much difference. I've done what you suggested and ran each on a different host and the time-to-serves are now negligibly different with vs. without proxy.
Ray2k
+1  A: 

In your case, that 40ms is probably just a scheduler time quantum. In other words, that's how long it takes your system to get back round to the other tasks. Try it on a real network, you'll get a completely different picture. If you have a multi-core machine, using virtual OS instances in Virtualbox or some other VM would give you a much better idea of what is really going to happen.

Andrew McGregor
Thanks, I've done what you suggested (real network) and am now getting expected results.
Ray2k
A: 

40ms is the TCP ACK delay on Linux, which indicates that you are likely encountering a bad interaction between delayed acks and the Nagle algorithm. The best way to address this is to send all of your data using a single call to send() or sendmsg(), before waiting for a response. If that is not possible then certain TCP socket options including TCP_QUICKACK (on the receiving side), TCP_CORK (sending side), and TCP_NODELAY (sending side) can help, but can also hurt if used improperly. TCP_NODELAY simply disables the Nagle algorithm and is a one-time setting on the socket, whereas the other two must be set at the appropriate times during the life of the connection and can therefore be trickier to use.

mark4o