views:

921

answers:

6

I have a very simple program written in 5 min that opens a sever socket and loops through the request and prints to the screen the bytes sent to it.

I then tried to benchmark how many connections I can hammer it with to try to find out how many concurrent users I can support with this program.

On another machine (where the network between them is not saturated) I created a simple program that goes into a loop and connects to the server machine and send the bytes "hello world".

When the loop is 1000-3000 the client finishes with all requests sent. When the loop goes beyond 5000 it starts to have time outs after finish the first X number of requests. Why is this? I have made sure to close my socket in the loop.

Can you only create so many connections within a certain period of time?

Is this limit only applicable between the same machines and I need not worry about this in production where 5000+ requests are all coming from different machines?

+7  A: 

There is a limit, yes. See ulimit.

Also you need to consider the TIMED_WAIT state. Once a TCP socket is closed (by default) the port remains occupied in TIMED_WAIT status for 2 minutes. This value is tunable. This will also "run you out of sockets" even though they are closed.

Run netstat to see the TIMED_WAIT stuff in action.

P.S. The reason for TIMED_WAIT is to handle the case of packets arriving after the socket is closed. This can happen because packets are delayed or the other side just doesn't know that the socket has been closed yet. This allows the OS to silently drop those packets without a chance of "infecting" a different, unrelated socket connection.

Jason Cohen
I just checked with netstat on both machines, and there are indeed a ton of TIMED_WAIT on the client side but no TIMED_WAIT on the server side. Is this the bahaviour you are describing?Assuming yes:1) Does this mean this won't be an issue in production because the limit seems to be coming from the client side (running out of socket) and not he server side (where no sockets are created)2) Is there a way to get around this so I can test my server with load similar to production?
erotsppa
The behavior of TIMED_WAIT is OS-specific. Yes you can get around it -- it's possible to change the TIMED_WAIT timeout e.g. from 120 seconds to 30 or even less.
Jason Cohen
+1  A: 

Yes. Check into ulmit for information on how to change the limits.

unwind
A: 

You might wanna check out /etc/security/limits.conf

Ian
A: 

Yep, the limit is set by the kernel; check out this thread on Stack Overflow for more details: http://stackoverflow.com/questions/410616/increasing-the-maximum-number-of-tcp-ip-connections-in-linux

Don Werve
+2  A: 

When looking for the max performance you run into a lot of issue and potential bottlenecks. Running a simple hello world test is not necessarily going to find them all.

Possible limitations include:

  • Kernel socket limitations: look in /proc/sys/net for lots of kernel tuning..
  • process limits: check out ulimit as others have stated here
  • as your application grows in complexity, it may not have enough CPU power to keep up with the number of connections coming in. Use top to see if your CPU is maxed
  • number of threads? I'm not experienced with threading, but this may come into play in conjunction with the previous items.
DGM
A: 

Is your server single-threaded? If so, what polling / multiplexing function are you using?

Using select() does not work beyond the hard-coded maximum file descriptor limit set at compile-time, which is hopeless (normally 256, or a few more).

poll() is better but you will end up with the scalability problem with a large number of FDs repopulating the set each time around the loop.

epoll() should work well up to some other limit which you hit.

10k connections should be easy enough to achieve. Use a recent(ish) 2.6 kernel.

How many client machines did you use? Are you sure you didn't hit a client-side limit?

MarkR