views:

186

answers:

6

Is it just me or is having to run multiple instances of a web server to scale a hack?

Am I wrong in this?

Clarification

I am referring to how I read people run multiple instances of a web service on a single server. I am not talking about a cluster of servers.

+3  A: 

Not really, people were running multiple frontends across a cluster of servers before multicore cpus became widespread

So there has been all the infrastructure for supporting sessions properly across multiple frontends for quite some time before it became really advantageous to run a bunch of threads on one machine.

Infact using asynchronous style frontends gives better performance on the same hardware than a multithreaded approach, so I would say that not running multiple instances in favour of a multithreaded monster is a hack

gnibbler
just to clarify, i'm referring to multiple web servers running on a single server.
Blankman
@Blankman, I am trying to say that it is not a big deal to treat a single multicore machine the same as a cluster. I think there are advantages - you can use the load balancer to manage seamless upgrades for example, and when you _do_ need to add a second server as the site grows, it's a piece of cake.
gnibbler
+4  A: 

Since we are now moving towards more cores, rather than faster processors - in order to scale more and more, you will need to be running more instances.

So yes, I reckon you are wrong.

This does not by any means condone brain-dead programming with the excuse that you can just scale it horizontally, that just seems retarded.

Omar Qureshi
+1  A: 

With no details, it is very difficult to see what you are getting at. That being said, it is quite possible that you are simply not using the right approach for your problem.

Sometimes multiple separate instances are better. Sometimes, your Python services are actually better deployed behind a single Apache instance (using mod_wsgi) which may elect to use more than a single process. I don't know about Ruby to opinionate there.

In short, if you want to make your service scalable then the way to do so depends heavily on additional details. Is it scaling up or scaling out? What is the operating system and available or possibly installable server software? Is the service itself easily parallelized and how much is it database dependent? How is the database deployed?

Muhammad Alkarouri
+1  A: 

Even if Ruby/Python interpreters were perfect, and could utilize all avail CPU with single process, you would still reach maximal capability of single server sooner or later and have to scale across several machines, going back to running several instances of your app.

Daniel Kluev
+1  A: 

I would hesitate to say that the issue is a "hack". Or indeed that threaded solutions are necessarily superior.

The situation is a result of design decisions used in the interpreters of languages like Ruby and Python.

I work with Ruby, so the details may be different for other languages.

BUT ... essentially, Ruby uses a Global Interpreter Lock to prevent threading issues: http://en.wikipedia.org/wiki/Global_Interpreter_Lock

The side-effect of this is that to achieve concurrency with frameworks like Rails, rather than relying on multiple threads within the VM, we use multiple processes, each with its own interpreter and instance of your framework and application code

Each instance of the app handles a single request at a time. To achieve concurrency we have to spin up multiple instances.

In the olden days (2-3 years ago) we would run multiple mongrel (or similar) instances behind a proxy (generally apache). Passenger changed some of this because it is smart enough to manage the processes itself, rather than requiring manual setup. You tell Passenger how many processes it can use and off it goes.

The whole structure is actually not as bad as the thread-orthodoxy would have you believe. For a start, it's pretty easy to make this type of architecture work in a multicore environment. Any modern database is designed to handle highly concurrent loads, so having multiple processes has very little if any effect at that level.

If you use a language like JRuby you can deploy into a threaded app server like Tomcat and have a deployment that looks much more "java-like". However, this is not as big a win as you might think, because now your application needs to be much more thread-aware and you can see side effects and strangeness from threading issues.

Toby Hede
A: 

Your assumption that Tomcat's and IIS's single process per server is superior is flawed. The choice of a multi-threaded server and a multi-process server depends on a lot of variables.

One main thing is the underlying operating system. Unix systems have always had great support for multi-processing because of the copy-on-write nature of the fork system call. This makes multi-processes a really attractive option because web-serving is usually very shared-nothing and you don't have to worry about locking. Windows on the other hand had much heavier processes and lighter threads so programs like IIS would gravitate to a multi-threading model.

As for the question to wether it's a hack to run multiple servers really depends on your perspective. If you look at Apache, it comes with a variety of pluggable engines to choose from. The MPM-prefork one is the default because it allows the programmer to easily use non-thread-safe C/Perl/database libraries without having to throw locks and semaphores all over the place. To some that might be a hack to work around poorly implemented libraries. To me it's a brilliant way of leaving it to the OS to handle the problems and letting me get back to work.

Also a multi-process model comes with a few features that would be very difficult to implement in a multi-threaded server. Because they are just processes, zero-downtime rolling-updates are trivial. You can do it with a bash script.

It also has it's short-comings. In a single-server model setting up a singleton that holds some global state is trivial, while on a multi-process model you have to serialize that state to a database or Redis server. (Of course if your single-process server outgrows a single server you'll have to do that anyway.)

Is it a hack? Yes and no. Both original implementations (MRI, and CPython) have Global Interpreter Locks that will prevent a multi-core server from operating at it's 100% potential. On the other hand multi-process has it's advantages (especially on the Unix-side of the fence).

There's also nothing inherent in the languages themselves that makes them require a GIL, so you can run your application with Jython, JRuby, IronPython or IronRuby if you really want to share state inside a single process.

wm_eddie