views:

189

answers:

3

I need to run Linux-Apache-PHP-MySQL application (Moodle e-learning platform) for a large number of concurrent users - I am aiming 5000 users. By concurrent I mean that 5000 people should be able to work with the application at the same time. "Work" means not only do database reads but writes as well.

The application is not very typical, since it is doing a lot of inserts/updates on the database, so caching techniques are not helping to much. We are using InnoDB storage engine. In addition application is not written with performance in mind. For instance one Apache thread usually occupies about 30-50 MB of RAM.

I would be greatful for information what hardware is needed to build scalable configuration that is able to handle this kind of load.

We are using right now two HP DLG 380 with two 4 core processors which are able to handle much lower load (typically 300-500 concurrent users). Is it reasonable to invest in this kind of boxes and build cluster using them or is it better to go with some more high-end hardware?

I am particularly curious

  • how many and how powerful servers are needed (number of processors/cores, size of RAM)
  • what network equipment should be used (what kind of switches, network cards)
  • any other hardware, like particular disc storage solutions, etc, that are needed

Another thing is how to put together everything, that is what is the most optimal architecture. Clustering with MySQL is rather hard (people are complaining about MySQL Cluster, even here on Stackoverflow).

A: 

I'm not so sure about hardware, but from a software point-of-view:

With an efficient data layer that will cache objects and collections returned from the database then I'd say a standard master-slave configuration would work fine. Route all writes to a beefy master and all reads to slaves, adding more slaves as required.

Cache data as objects returned from your data-mapper/ORM and not HTML, and use Memcached as your caching layer. If you update an object then write to the db and update in memcached, best use IdentityMap pattern for this. You'll probably need quite a few Memcached instances although you could get away with running these on your web servers.

We could never get MySQL clustering to work properly.

Be careful with the SQL queries you write and you should be fine.

SlappyTheFish
+1  A: 

Once you get past the point where a couple of physical machines aren't giving you the peak load you need, you probably want to start virtualising.

EC2 is probably the most flexible solution at the moment for the LAMP stack. You can set up their VMs as if they were physical machines, cluster them, spin them up as you need more compute-time, switch them off during off-peak times, create machine images so it's easy to system test...

There are various solutions available for load-balancing and automated spin-up.

If you can make your app fit, you can get use out of their non-relational database engine as well. At very high loads, relational databases (and MySQL in particular) don't scale effectively. The peak load of SimpleDB, BigTable and similar non-relational databases can scale almost linearly as you add hardware.

Moving away from a relational database is a huge step though, I can't say I've ever needed to do it myself.

Iain Galloway
EC2 seams to be a good option
Piotr Kochański
A: 

Piotr, have you tried asking this question on moodle.org yet? There are a couple of similar scoped installations whose staff members answer that currently.

Also, depending on what your timeframe for deployment is, you might want to check out the moodle 2.0 line rather than the moodle 1.9 line, it looks like there are a bunch of good fixes for some of the issues with moodle's architecture in that version.

also: memcached rocks for this. php acceleration rocks for this. serverfault is probably the better *exchange site for this question though

corprew