views:

3909

answers:

11

Hi,

I am looking for a python webserver which is multithreaded instead of being multi-process (as in case of mod_python for apache). I want it to be multithreaded because I want to have an in memory object cache that will be used by various http threads. My webserver does a lot of expensive stuff and computes some large arrays which needs to be cached in memory for future use to avoid recomputing. This is not possible in a multi-process web server environment. Storing this information in memcache is also not a good idea as the arrays are large and storing them in memcache would lead to deserialization of data coming from memcache apart from the additional overhead of IPC.

I implemented a simple webserver using BaseHttpServer, it gives good performance but it gets stuck after a few hours time. I need some more matured webserver. Is it possible to configure apache to use mod_python under a thread model so that I can do some object caching?

+2  A: 

Not multithreaded, but twisted might serve your needs.

fivebells
If its not multithreaded then how will I be able to store objects into the cache and use them across multiple http requests?
NeoAnderson
It's an asynchronous programming framework which uses select. http://twistedmatrix.com/projects/core/documentation/howto/async.html
fivebells
Actually, it is multithreaded. See my answer below.
Glyph
+1  A: 

Perhaps you have a problem with your implementation in Python using BaseHttpServer. There's no reason for it to "get stuck", and implementing a simple threaded server using BaseHttpServer and threading shouldn't be difficult.

Also, see http://blog.doughellmann.com/2007/12/pymotw-basehttpserver.html about implementing a simple multi-threaded server with HTTPServer and ThreadingMixIn

Eli Bendersky
A: 

I use CherryPy both personally and professionally, and I'm extremely happy with it. I even do the kinds of thing you're describing, such as having global object caches, running other threads in the background, etc. And it integrates well with Apache; simply run CherryPy as a standalone server bound to localhost, then use Apache's mod_proxy and mod_rewrite to have Apache transparently forward your requests to CherryPy.

The CherryPy website is http://cherrypy.org/

Eli Courtwright
+2  A: 

You could instead use a distributed cache that is accessible from each process, memcached being the example that springs to mind.

johnstok
even better, the python app can seed memcached with full HTML pages, and have a frontend server (like nginx) pull from there, calling the webapp (via FastCGI) only when the cache fails, or on POST requests
Javier
+10  A: 

CherryPy. Features, as listed from the website:

  • A fast, HTTP/1.1-compliant, WSGI thread-pooled webserver. Typically, CherryPy itself takes only 1-2ms per page!
  • Support for any other WSGI-enabled webserver or adapter, including Apache, IIS, lighttpd, mod_python, FastCGI, SCGI, and mod_wsgi
  • Easy to run multiple HTTP servers (e.g. on multiple ports) at once
  • A powerful configuration system for developers and deployers alike
  • A flexible plugin system
  • Built-in tools for caching, encoding, sessions, authorization, static content, and many more
  • A native mod_python adapter
  • A complete test suite
  • Swappable and customizable...everything.
  • Built-in profiling, coverage, and testing support.
David Eyk
+5  A: 

Consider reconsidering your design. Maintaining that much state in your webserver is probably a bad idea. Multi-process is a much better way to go for stability.

Is there another way to share state between separate processes? What about a service? Database? Index?

It seems unlikely that maintaining a huge array of data in memory and relying on a single multi-threaded process to serve all your requests is the best design or architecture for your app.

Troy Howard
A database-like back-end is much better than shared data among threads. Simple RESTful interaction between web transactions and "some large arrays" might be easier to manage.
S.Lott
A: 

Just to point out something different from the usual suspects...

Some years ago while I was using Zope 2.x I read about Medusa as it was the web server used for the platform. They advertised it to work well under heavy load and it can provide you with the functionality you asked.

Toni Ruža
Medusa is an old and probably defunct project. Twisted is the better choice here.
mhawke
+2  A: 

Its hard to give a definitive answer without knowing what kind of site you are working on and what kind of load you are expecting. Sub second performance may be a serious requirement or it may not. If you really need to save that last millisecond then you absolutely need to keep your arrays in memory. However as others have suggested it is more than likely that you don't and could get by with something else. Your usage pattern of the data in the array may affect what kinds of choices you make. You probably don't need access to the entire set of data from the array all at once so you could break your data up into smaller chunks and put those chunks in the cache instead of the one big lump. Depending on how often your array data needs to get updated you might make a choice between memcached, local db (berkley, sqlite, small mysql installation, etc) or a remote db. I'd say memcached for fairly frequent updates. A local db for something in the frequency of hourly and remote for the frequency of daily. One thing to consider also is what happens after a cache miss. If 50 clients all of a sudden get a cache miss and all of them at the same time decide to start regenerating those expensive arrays your box(es) will quickly be reduced to 8086's. So you have to take in to consideration how you will handle that. Many articles out there cover how to recover from cache misses. Hope this is helpful.

Sam Corder
+4  A: 

Twisted can serve as such a web server. While not multithreaded itself, there is a (not yet released) multithreaded WSGI container present in the current trunk. You can check out the SVN repository and then run:

twistd web --wsgi=your.wsgi.application
Glyph
+1  A: 

I actually had the same issue recently. Namely: we wrote a simple server using BaseHTTPServer and found that the fact that it's not multi-threaded was a big drawback.

My solution was to port the server to Pylons (http://pylonshq.com/). The port was fairly easy and one benefit was it's very easy to create a GUI using Pylons so I was able to throw a status page on top of what's basically a daemon process.

I would summarize Pylons this way:

  • it's similar to Ruby on Rails in that it aims to be very easy to deploy web apps
  • it's default templating language, Mako, is very nice to work with
  • it uses a system of routing urls that's very convenient
  • for us performance is not an issue, so I can't guarantee that Pylons would perform adequately for your needs
  • you can use it with Apache & Lighthttpd, though I've not tried this

We also run an app with Twisted and are happy with it. Twisted has good performance, but I find Twisted's single-threaded/defer-to-thread programming model fairly complicated. It has lots of advantages, but would not be my choice for a simple app.

Good luck.

+2  A: 

web.py has made me happy in the past. Consider checking it out.

But it does sound like an architectural redesign might be the proper, though more expensive, solution.