views:

480

answers:

6

I have tried to create a simple program in different languages (C#, Java, C++, PHP) to connect to a server and all behaved in the same way. So I believe this problem is more of a OS level thing.

Basically I want the program to connect to the server with a TCP socket and send 1 byte and then close the socket. This needs to done thousands of times per second and sustained over a period of time. This is for the purposes of benchmarking the server.

So far, after a few thousands client sockets, the system comes to a complete halt. It can only start creating sockets after a minute or so of cool down. I have made sure that I have closed each socket after transmission.

Now, this kind of problem is familiar with servers like Apache where utilities (like ab/siege) are advised to benchmark Apache using the keep-alive protocol. I.e., create small number of TCP connections but make multiple requests through them for benchmark purposes. This is however not possible in our case as our proprietary server does not serve HTTP and does not support the HTTP 1.1 keep-alive model.

So how can this be achieved? I have checked the following limits

  1. ulimit is set to a very high number
  2. TCP TIME_WAIT is eliminated by setting /proc/sys/net/ipv4/tcp_rw_recycle and /proc/sys/net/ipv4/tcp_rw_reuse to 1. (I have indeed confirmed with netstat there are no TIME_WAIT sockets)
  3. This is not related to limits on number of threads/processes. I have tried restarting my client app and it is the same. Once the OS refuses new sockets, nothing will help.

PS. This is NOT a server side limitation. We tested this by buying another box and running the same client code on it when the first client box refused to make new sockets. The server handled it fine. We don't want to buy 5-10 boxes and rotate between them to overcome this problem.

OS: Fedora 10 Linux 2.6.24-23-xen #1 SMP

+7  A: 

Old joke: Man goes to doctor, says "Doctor, it hurts when I do this," twisting his arm into a strange position.

Doctor replies, "Well, don't do that!"

Look, what you're doing is a very unnatural process. Establishing a TCP connection requires a handshake, transmitting bytes far in excess of the one byte per message. Setup and teardown time are going to be significant. It's very probable that what you're doing is using up kernel resources associated with that handshake; sure enough, if you then let it alone and stop slapping it, it eventually catches up.

So, what are you really trying to measure? What are you really trying to do? If you're really trying to send a single byte at a time -- gods forbid -- at least think about using udp; there's no awful setup/teardown. It's still immensely inefficient compared to the overhead -- even a UDP packet requires something like 20 bytes of framing -- but it's better.

Charlie Martin
+1  A: 

Connecting then sending 1 byte is not a benchmark of anything except maybe the TCP protocol in itself. As Charlie Martin said above, most of the time is wasted connecting then disconnecting the socket.

I understand you WANT to benchmark, but is this really a good representation of what your app does? Are you really going to be setting up a connection just to send 1 byte?

ryeguy
It seems like he wants to benchmark the connection establishment time--could be a valid benchmark if combined with others that checked throughput, etc.
Drew Hall
+2  A: 

Is it possible you ran out of ports? You only get 5000 - 1024 ports unless you're willing to call bind() in a loop to find next free port.

bind() with 0 for port returns a free port within the range 1024-5000. bind() with a specified port gets that port if available.

int bindnextport(int s, struct sockaddr sa)  
{  
   static int nextport = 1025;  
   int lastport;  
   lastport = nextport;  
   do {  
      sa.sa_data[0] = nextport >> 8;  
      sa.sa_data[1] = nextport & 255;  
      if (!bind(s, &sa, sizeof(sa))  
         return 0;  
      ++nextport;  
      if (nextport >= 65536) nextport = 1024;  
   } while (lastport != nextport);  
   return 1;  
}
Joshua
I feel like there's got to be some sysctl knob to change that 5000 to a different value, but I don't know what it could be...
ephemient
/proc/sys/net/ipv4/ip_local_port_range, see http://www.faqs.org/docs/securing/chap6sec70.html
Hasturkun
By "5000", you mean "65535", or what?
unwind
+1  A: 

The nginx http server claims to be able to keep 10,000 inactive HTTP keep-alive connections. You might take a look at how they are doing it.

Chas. Owens
+3  A: 

Have you tried setting the flag SO_REUSEADDR on the socket?

justinhj
I suspect this is the right answer... +1.
Drew Hall
+3  A: 

Take a look at Richard Jones' article, A Million-user Comet Application with Mochiweb, Part 3. It's about implementing a Comet app in Erlang, but the section "Turning it up to 1 Million" describes how he benchmarked his server; it opens with the statement "Creating a million tcp connections from one host is non-trivial." That should give you some idea of what you're in for.

kquinn
Wow, what a great article. Thanks!
Kyle W. Cartmell