views:

1777

answers:

6

I was recently reading this document which lists a number of strategies that could be employed to implement a socket server. Namely, they are:

  1. Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
  2. Serve many clients with each thread, and use nonblocking I/O and readiness change notification
  3. Serve many clients with each server thread, and use asynchronous I/O
  4. serve one client with each server thread, and use blocking I/O
  5. Build the server code into the kernel

Now, I would appreciate a hint on which should be used in CPython, which we know has some good points, and some bad points. I am mostly interested in performance under high concurrency, and yes a number of the current implementations are too slow.

So if I may start with the easy one, "5" is out, as I am not going to be hacking anything into the kernel.

"4" Also looks like it must be out because of the GIL. Of course, you could use multiprocessing in place of threads here, and that does give a significant boost. Blocking IO also has the advantage of being easier to understand.

And here my knowledge wanes a bit:

"1" is traditional select or poll which could be trivially combined with multiprocessing.

"2" is the readiness-change notification, used by the newer epoll and kqueue

"3" I am not sure there are any kernel implementations for this that have Python wrappers.

So, in Python we have a bag of great tools like Twisted. Perhaps they are a better approach, though I have benchmarked Twisted and found it too slow on a multiple processor machine. Perhaps having 4 twisteds with a load balancer might do it, I don't know. Any advice would be appreciated.

+1  A: 

http://docs.python.org/library/socketserver.html#asynchronous-mixins

As for multi-processor (multi-core) machines. With CPython due to GIL you'll need at least one process per core, to scale. As you say that you need CPython, you might try to benchmark that with ForkingMixIn. With Linux 2.6 might give some interesting results.

Other way is to use Stackless Python. That's how EVE solved it. But I understand that it's not always possible.

vartec
Thanks, but have you benchmarked those things? They are slow. I wouldn't invent the wheel if I didn't have to.
Ali A
+1 stackless/EVE, but I did say CPython
Ali A
+7  A: 

asynccore is basically "1" - It uses select internally, and you just have one thread handling all requests. According to the docs it can also use poll. (EDIT: Removed Twisted reference, I thought it used asynccore, but I was wrong).

"2" might be implemented with python-epoll (Just googled it - never seen it before). EDIT: (from the comments) In python 2.6 the select module has epoll, kqueue and kevent build-in (on supported platforms). So you don't need any external libraries to do edge-triggered serving.

Don't rule out "4", as the GIL will be dropped when a thread is actually doing or waiting for IO-operations (most of the time probably). It doesn't make sense if you've got huge numbers of connections of course. If you've got lots of processing to do, then python may not make sense with any of these schemes.

For flexibility maybe look at Twisted?

In practice your problem boils down to how much processing you are going to do for requests. If you've got a lot of processing, and need to take advantage of multi-core parallel operation, then you'll probably need multiple processes. On the other hand if you just need to listen on lots of connections, then select or epoll, with a small number of threads should work.

Douglas Leeder
I think epoll is in the stdlib in 2.6+, and easy_installable for 2.5. The package is called select-something. Sorry for vagueness.
Ali A
Twisted can also use epoll. In fact, Twisted turns all supported event-notification APIs into a uniform API which it presents to you. So if the best the platform can do is select, your app uses select. If it has epoll, your app uses epoll. All transparently to you.
Jean-Paul Calderone
+1  A: 

I like Douglas' answer, but as an aside...

You could use a centralized dispatch thread/process that listens for readiness notifications using select and delegates to a pool of worker threads/processes to help accomplish your parallelism goals.

As Douglas mentioned, however, the GIL won't be held during most lengthy I/O operations (since no Python-API things are happening), so if it's response latency you're concerned about you can try moving the critical portions of your code to CPython API.

cdleary
+2  A: 

Can I suggest additional links?

cogen is a crossplatform library for network oriented, coroutine based programming using the enhanced generators from python 2.5. On the main page of cogen project there're links to several projects with similar purpose.

Eugene Morozov
+3  A: 

How about "fork"? (I assume that is what the ForkingMixIn does) If the requests are handled in a "shared nothing" (other than DB or file system) architecture, fork() starts pretty quickly on most *nixes, and you don't have to worry about all the silly bugs and complications from threading.

Threads are a design illness forced on us by OSes with too-heavy-weight processes, IMHO. Cloning a page table with copy-on-write attributes seems a small price, especially if you are running an interpreter anyway.

Sorry I can't be more specific, but I'm more of a Perl-transitioning-to-Ruby programmer (when I'm not slaving over masses of Java at work)


Update: I finally did some timings on thread vs fork in my "spare time". Check it out:

http://roboprogs.com/devel/2009.04.html

Expanded: http://roboprogs.com/devel/2009.12.html

Roboprog
in addition, you could pull in the code for other modules that you know you will use before you start the child processes. This will prevent them from being tokenized (JIT-ed, whatever) over and over. Second, keep data in the parent small, use "exit" as a super garbage collector.
Roboprog
+2  A: 

One sollution is gevent. Gevent maries a libevent based event polling with lightweight cooperative task switching implemented by greenlet.

What you get is all the performance and scalability of an event system with the elegance and straightforward model of blocking IO programing.

(I don't know what the SO convention about answering to realy old questions is, but decided I'd still add my 2 cents)

gdamjan