views:

402

answers:

4

There appear to be several options available to programs that handle large numbers of socket connections (such as web services, p2p systems, etc).

  1. Spawn a separate thread to handle I/O for each socket.
  2. Use the select system call to multiplex the I/O into a single thread.
  3. Use the poll system call to multiplex the I/O (replacing the select).
  4. Use the epoll system calls to avoid having to repeatedly send sockets fd's through the user/system boundaries.
  5. Spawn a number of I/O threads that each multiplex a relatively small set of the total number of connections using the poll API.
  6. As per #5 except using the epoll API to create a separate epoll object for each independent I/O thread.

On a multicore CPU I would expect that #5 or #6 would have the best performance, but I don't have any hard data backing this up. Searching the web turned up this page describing the experiences of the author testing approaches #2, #3 and #4 above. Unfortunately this web page appears to be around 7 years old with no obvious recent updates to be found.

So my question is which of these approaches have people found to be most efficient and/or is there another approach that works better than any of those listed above? References to real life graphs, whitepapers and/or web available writeups will be appreciated.

+2  A: 

I think this is a solved problem and the answer is here - http://www.kegel.com/c10k.html

No it's not. He wants "hard data"
Seun Osewa
A: 

I use epoll() extensively, and it performs well. I routinely have thousands of sockets active, and test with up to 131,072 sockets. And epoll() can always handle it.

I use multiple threads, each of which poll on a subset of sockets. This complicates the code, but takes full advantage of multi-core CPUs.

Martin Del Vecchio
+1  A: 

From my experience, you'll have the best perf with #6.

I also recommend you look into libevent to deal with abstracting some of these details away. At the very least, you'll be able to see some of their benchmark results.

Also, about how many sockets are you talking about? Your approach probably doesn't matter too much until you start getting at least a few hundred sockets.

twk
+2  A: 

Speaking with my experience with running large IRC servers, we used to use select() and poll() (because epoll()/kqueue() weren't available). At around about 700 simultaneous clients, the server would be using 100% of a CPU (the irc server wasn't multithreaded). However, interestingly the server would still perform well. At around 4,000 clients, the server would start to lag.

The reason for this was that at around 700ish clients, when we'd get back to select() there would be one client available for processing. The for() loops scanning to find out which client it was would be eating up most of the CPU. As we got more clients, we'd start getting more and more clients needing processing in each call to select(), so we'd become more efficient.

Moving to epoll()/kqueue(), similar spec'd machines would trivially deal with 10,000 clients, with some (admitidly more powerful machines, but still machines that would be considered tiny by todays standards), have held 30,000 clients without breaking a sweat.

Experiments I've seen with SIGIO seem to suggest it works well for applications where latency is extremely important, where there are only a few active clients doing very little individual work.

I'd recommend using epoll()/kqueue() over select()/poll() in almost any situation. I've not experimented with splitting clients between threads. To be honest, I've never found a service that needed more optimsation work done on the front end client processing to justify the experimentation with threads.