views:

738

answers:

2

Tornadoweb and Nginx are popular web servers for the moment and many benchmarkings show that they have a better performance than Apache under certain circumstances. So my question is:

Is 'epoll' the most essential reason that make them so fast? And what can I learn from that if I want to write a good socket server?

+2  A: 

Yes and no. While they both use epoll, its technically that they both use an event loop for handling the requests. You can find more information about what event loops are and how they're used at wikipedia.

Check out libevent (used by gevent, generally faster & more stable than tornado) or libev for implementations.

digitala
and could you please be more specifically, where can I find more details about all these stuff, libevent is enough?
Mickey Shine
Here's a good book on libevent by one of the maintainers: http://www.wangafu.net/~nickm/libevent-book/
Denis Bilenko
+10  A: 

If you're looking to write a socket server, a good starting point is Dan Kegel's C10k article from a few years back:

http://www.kegel.com/c10k.html

I also found Beej's Guide to Network Programming to be pretty handy:

http://beej.us/guide/bgnet/

Finally, if you need a great reference, there's UNIX Network Programming by W. Richard Stevens et. al.:

http://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551/ref=dp_ob_title_bk

Anyway, to answer your question, the main difference between Apache and Nginx is that Apache uses one thread per client with blocking I/O, whereas Nginx is single-threaded with non-blocking I/O. Apache's worker pool does reduce the overhead of starting and destorying processes, but it still makes the CPU switch between several threads when serving multiple clients. Nginx, on the other hand, handles all requests in one thread. When one request needs to make a network request (say, to a backend), Nginx attaches a callback to the backend request and then works on another active client request. In practice, this means it returns to the event loop (epoll, kqueue, or select) and asks for file descriptors that have something to report. Note that the system call in main event loop is actually a blocking operation, because there's nothing to do until one of the file descriptors is ready for reading or writing.

So that's the main reason Nginx and Tornado are efficient at serving many simultaneous clients: there's only ever one process (thus saving RAM) and only one thread (thus saving CPU from context switches). As for epoll, it's just a more efficient version of select. If there are N open file descriptors (sockets), it lets you pick out the ones ready for reading in O(1) instead of O(N) time. In fact, Nginx can use select instead of epoll if you compile it with the --with-select_module option, and I bet it will still be more efficient than Apache. I'm not as familiar with Apache internals, but a quick grep shows it does use select and epoll -- probably when the server is listening to multiple ports/interfaces, or if it does simultaneous backend requests for a single client.

Incidentally, I got started with this stuff trying to write a basic socket server and wanted to figure out how Nginx was so freaking efficient. After poring through the Nginx source code and reading those guides/books I linked to above, I discovered it'd be easier to write Nginx modules instead of my own server. Thus was born the now-semi-legendary Emiller's Guide to Nginx Module Development:

http://www.evanmiller.org/nginx-modules-guide.html

(Warning: the Guide was written against Nginx 0.5-0.6 and APIs may have changed.) If you're doing anything with HTTP, I'd say give Nginx a shot because it's worked out all the hairy details of dealing with stupid clients. For example, the small socket server that I wrote for fun worked great with all clients -- except Safari, and I never figured out why. Even for other protocols, Nginx might be the right way to go; the eventing is pretty well abstracted from the protocols, which is why it can proxy HTTP as well as IMAP. The Nginx code base is extremely well-organized and very well-written, with one exception that bears mentioning. I wouldn't follow its lead when it comes to hand-rolling a protocol parser; instead, use a parser generator. I've written some stuff about using a parser generator (Ragel) with Nginx here:

http://www.evanmiller.org/nginx-modules-guide-advanced.html#parsing

All of this was probably more information than you wanted, but hopefully you'll find some of it useful.

Emiller
That's exactly what I need, thank you very much!
Mickey Shine
Omg, thank you man !
2x2p1p