There appear to be several options available to programs that handle large numbers of socket connections (such as web services, p2p systems, etc).
- Spawn a separate thread to handle I/O for each socket.
- Use the select system call to multiplex the I/O into a single thread.
- Use the poll system call to multiplex the I/O (replacing the select).
- Use the epoll system calls to avoid having to repeatedly send sockets fd's through the user/system boundaries.
- Spawn a number of I/O threads that each multiplex a relatively small set of the total number of connections using the poll API.
- As per #5 except using the epoll API to create a separate epoll object for each independent I/O thread.
On a multicore CPU I would expect that #5 or #6 would have the best performance, but I don't have any hard data backing this up. Searching the web turned up this page describing the experiences of the author testing approaches #2, #3 and #4 above. Unfortunately this web page appears to be around 7 years old with no obvious recent updates to be found.
So my question is which of these approaches have people found to be most efficient and/or is there another approach that works better than any of those listed above? References to real life graphs, whitepapers and/or web available writeups will be appreciated.