We have a web service which serves small, arbitrary segments of a fixed inventory of larger MP3 files. The MP3 files are generated on-the-fly by a python application. The model is, make a GET request to a URL specifying which segments you want, get an audio/mpeg
stream in response. This is an expensive process.
We're using Nginx as the front-end request handler. Nginx takes care of caching responses for common requests.
We initially tried using Tornado on the back-end to handle requests from Nginx. As you would expect, the blocking MP3 operation kept Tornado from doing its thing (asynchronous I/O). So, we went multithreaded, which solved the blocking problem, and performed quite well. However, it introduced a subtle race condition (under real world load) that we haven't been able to diagnose or reproduce yet. The race condition corrupts our MP3 output.
So we decided to set our application up as a simple WSGI handler behind Apache/mod_wsgi (still w/ Nginx up front). This eliminates the blocking issue and the race condition, but creates a cascading load (i.e. Apache creates too many processses) on the server under real world conditions. We're working on tuning Apache/mod_wsgi right now, but still at a trial-and-error phase. (Update: we've switched back to Tornado. See below.)
Finally, the question: are we missing anything? Is there a better way to serve CPU-expensive resources over HTTP?
Update: Thanks to Graham's informed article, I'm pretty sure this is an Apache tuning problem. In the mean-time, we've gone back to using Tornado and are trying to resolve the data-corruption issue.
For those who were so quick to throw more iron at the problem, Tornado and a bit of multi-threading (despite the data integrity problem introduced by threading) handles the load acceptably on a small (single core) Amazon EC2 instance.