views:

46

answers:

1

Hi Consider a somewhat process consuming PHP application that works heavily with complex DB transactions, and doesn't have any choice but to have fairly high memory consuming algorithms. Like up to 4MB memory usage per request. (1 MB on average for those certain requests, less than 200KB of variable data for regular requests) Its obvious that we are gonna use something like Amazon S3 to host our static data, but it's apparent that most of the load is on the dynamic parts.

The app will run on multiple servers, but how does that work? Do we make it like an ordinary application that runs on a single server, and the additional servers will just make it act like one big server with huge amounts of memory and processing power?

Our problem is, although the app is written in an extremely modular fashion, the process forces us to have everything in the same environment. We can put the database on a machine that's optimized for serving that, but there is almost no way to like put different modules of the application code itself on different servers.

So, what are the common solutions for this?

A: 

Out of MANY scalability solutions, one thing you can do is Load balancing with requests. If you can't put different parts of your applications on different servers, put the whole application replicated on multiple servers.

Now, end user request doesn't directly go to your application server. You need a load balancer (example: F5 Big IP) in between. The load balanacer will detect the most optimum server from the server pool and redirect request to that server. This way adding / removing a server takes 10-15 minutes of configuration on the load balancer.

How does load balancer know which server to select ?

A loadBlanacer script runs on each server which informs about server load / memory usage / process counts etc .. depending upon that it selects a server

Stewie
Yeah, thanks, now what about the database? Cause all the instances of the application, regardless of which server they are operating in, need to access only one database. And a rather high amount of the processing time is taken by the database interactions. Any automated solutions for that?
Cg Alive
WHat queries take most time ? DML or the select/worktable queries ? For select queries you can use read servers. These are database slaves that should be used ONLY for reading. Also, the most important thing I forgot to mention is "CACHING" .. Use APC / memcached
Stewie
Thanks, can you gimme a little more information about read servers?
Cg Alive
They are your primary database server clones. All INSERTS / UPDATES are done in your primary server (wil call it master) and the master will write to one or more slaves in the background. This way the slave is updated with the fresh information. Now, there will be a delay between a few seconds to a few minutes depending on network traffic / data size. These slave servers will be ready to take any select / worktable queries and serve them out, this way your primary db isn't loaded with those queries.
Stewie
Depending upon your requirements you can increase / decrease number of slaves ( which is again not hard to configure)
Stewie
Well, the problem is, all of my process consuming requests need a lot DB transactions, and once each request is finished, another one is instantly initiated and the new request needs the updated data based on the first request's transactions, so although the read servers would be helpful, in this situation, I need something more advanced. Like what does facebook do with their database?
Cg Alive