I have tried to create a simple program in different languages (C#, Java, C++, PHP) to connect to a server and all behaved in the same way. So I believe this problem is more of a OS level thing.
Basically I want the program to connect to the server with a TCP socket and send 1 byte and then close the socket. This needs to done thousands of times per second and sustained over a period of time. This is for the purposes of benchmarking the server.
So far, after a few thousands client sockets, the system comes to a complete halt. It can only start creating sockets after a minute or so of cool down. I have made sure that I have closed each socket after transmission.
Now, this kind of problem is familiar with servers like Apache where utilities (like ab/siege) are advised to benchmark Apache using the keep-alive protocol. I.e., create small number of TCP connections but make multiple requests through them for benchmark purposes. This is however not possible in our case as our proprietary server does not serve HTTP and does not support the HTTP 1.1 keep-alive model.
So how can this be achieved? I have checked the following limits
ulimit
is set to a very high numberTCP TIME_WAIT
is eliminated by setting/proc/sys/net/ipv4/tcp_rw_recycle
and/proc/sys/net/ipv4/tcp_rw_reuse
to 1. (I have indeed confirmed withnetstat
there are noTIME_WAIT
sockets)- This is not related to limits on number of threads/processes. I have tried restarting my client app and it is the same. Once the OS refuses new sockets, nothing will help.
PS. This is NOT a server side limitation. We tested this by buying another box and running the same client code on it when the first client box refused to make new sockets. The server handled it fine. We don't want to buy 5-10 boxes and rotate between them to overcome this problem.
OS: Fedora 10 Linux 2.6.24-23-xen #1 SMP