tags:

views:

88

answers:

4

Hi,

I need to build a webservice that is very computationally intensive, and I'm trying to get my bearings on how best to proceed.

I expect users to connect to my service, at which point some computation is done for some amount of time, typically less than 60s. The user knows that they need to wait, so this is not really a problem. My question is, what's the best way to structure a service like this and leave me with the least amount of headache? Can I use Node.js, web.py, CherryPy, etc.? Do I need a load balancer sitting in front of these pieces if used? I don't expect huge numbers of users, perhaps hundreds or into the thousands. I'll need a number of machines to host this number of users, of course, but this is uncharted territory for me, and if someone can give me a few pointers or things to read, that would be great.

Thanks.

+1  A: 

I think you can build it however you like, as long as you can make it an asynchronous service so that the users don't have to wait.

Unless, of course, the users don't mind waiting in this context.

Robert Harvey
So, suppose I go with a node.js - based approach, but later decide that I need to throw more machines at the task to handle more users. How do I scale up?
By abstracting the actual service call from the machines performing the computations. If you can do that using node.js, cool. If not, you may need to choose a different mechanism.
Robert Harvey
+1  A: 

I'd recommend using nginx as it can handle rewrite/balancing/ssl etc with a minimum of fuss

gnibbler
+2  A: 

Can I use Node.js, web.py, CherryPy, etc.?

Yes. Pick one. Django is nice, also.

Do I need a load balancer sitting in front of these pieces if used?

Almost never.

I'll need a number of machines to host this number of users,

Doubtful.

Remember that each web transaction has several distinct (and almost unrelated) parts.

  1. A front-end (Apache HTTPD or NGINX or similar) accepts the initial web request. It can handle serving static files (.CSS, .JS, Images, etc.) so your main web application is uncluttered by this.

  2. A reasonably efficient middleware like mod_wsgi can manage dozens (or hundreds) of backend processes.

  3. If you choose a clever backend processing component like celery, you should be able to distribute the "real work" to the minimal number of processors to get the job done.

  4. The results are fed back into Apache HTTPD (or NGINX) via mod_wsgi to the user's browser.

Now the backend processes (managed by celery) are divorced from the essential web server. You achieve a great deal of parallelism with Apache HTTPD and mod_wsgi and celery allowing you to use every scrap of processor resource.

Further, you may be able to decompose your "computationally intensive" process into parallel processes -- a Unix Pipeline is remarkably efficient and makes use of all available resources. You have to decompose your problem into step1 | step2 | step3 and make celery manage those pipelines.

You may find that this kind of decomposition leads to serving a far larger workload than you might have originally imagined.

Many Python web frameworks will keep the user's session information in a single common database. This means that all of your backends can -- without any real work -- move the user's session from web server to web server, making "load balancing" seamless and automatic. Just have lots of HTTPD/NGINX front-ends that spawn Django (or web.py or whatever) which all share a common database. It works remarkably well.

S.Lott
I'd agree about using celery or similar for backend queuing of jobs which perform the actual work. Ie., don't do the work in the web server/application processes. If however there are 1000s of pending requests, I would investigate web interface style which can trigger the job but then return response to browser straight away, with web interface triggering further requests to check progress. This is because WSGI blocking type interface not good for long operations because of process/thread resources required. If want request to wait, better off using asynchronous web/application server.
Graham Dumpleton
@Graham Dumpleton: Precisely. Further, I understand that celery does that.
S.Lott
A: 

If you want to make your web sevices asynchronous you can try Twisted. It is a framework oriented to asynchronous tasks and implements so many network protocols. It is so easy to offer this services via xml-rpc (just put xmlrpc_ as the prefix of your method). On the other hand it scales very well with hundreds and thousands of users.

Celery is also a good option to make the most computionally intensive tasks asynchronous. It integrates very well with Django.

cues7a