views:

228

answers:

4

I want to scale an e-commerce portal based on LAMP. Recently we've seen huge traffic surge.

What would be steps (please mention in order) in scaling it:

  1. Should I consider moving onto Amazon EC2 or similar? what could be potential problems in switching servers?

  2. Do we need to redesign database? I read, Facebook switched to Cassandra from MySql. What kind of code changes are required if switched to Cassandra? Would Cassandra be better option than MySql?

  3. Possibility of Hadoop, not even sure?

  4. Any other things, which need to be thought of?

Found this post helpful. This blog has nice articles as well. What I want to know is list of steps I should consider in scaling this app.

+1  A: 

Find out where issues are happening (or are likely to happen if you don't have them now). Knowing what is your biggest resource usage is important when evaluating any solution. Stick to solutions that will give you the biggest improvement.

Consider: - higher than needed bandwidth use x user is something you want to address regardless of moving to ec2. It will cost you money either way, so its worth a shot at looking at things like this: http://developer.yahoo.com/yslow/ - don't invest into changing databases if that's a non issue. Find out first if that's really the problem, and even if you are having issues with the database it might be a code issue i.e. hitting the database lots of times per request. - unless we are talking about v. big numbers, you shouldn't have high cpu usage issues, if you do find out where they are happening / optimization is worth it where specific code has a high impact in your overall resource usage. - after making sure the above is reasonable, you might get big improvements with caching. In bandwith (making sure browsers/proxy can play their part on caching), local resources usage (avoiding re-processing/re-retrieving the same info all the time).

I'm not saying you should go all out with the above, just enough to make sure you won't get the same issues elsewhere in v. few months. Also enough to find out where are your biggest gains, and if you will get enough value from any scaling options. This will also allow you to come back and ask questions about specific problems, and how these scaling options relate to those.

eglasius
+1  A: 

You should prepare by choosing a flexible framework and be sure things are going to change along the way. In some situations it's difficult to predict your user's behavior.

If you have seen an explosion of traffic recently, analyze what are the slowest pages.

  1. You can move to cloud, but EC2 is not the best performing one. Again, be sure there's no other optimization you can do.

  2. Database might be redesigned, but I doubt all of it. Again, see the problem points.

  3. Both Hadoop and Cassandra are pretty nifty, but they might be overkill.

Tudorizer
+14  A: 

First, I would suggest making sure every resource served by your server sets appropriate cache control headers. The goal is to make sure truly dynamic content gets served fresh every time and any stable or static content gets served from somebody else's cache as much as possible. Why deliver a product image to every AOL customer when you can deliver it to the first and let AOL deliver it to all the others?

If you currently run your webserver and dbms on the same box, you can look into moving the dbms onto a dedicated database server.

Once you have done the above, you need to start measuring the specifics. What resource will hit its capacity first?

For example, if the webserver is running at or near capacity while the database server sits mostly idle, it makes no sense to switch databases or to implement replication etc.

If the webserver sits mostly idle while the dbms chugs away constantly, it makes no sense to look into switching to a cluster of load-balanced webservers.

Take care of the simple things first.

If the dbms is the likely bottle-neck, make sure your database has the right indexes so that it gets fast access times during lookup and doesn't waste unnecessary time during updates. Make sure the dbms logs to a different physical medium from the tables themselves. Make sure the application isn't issuing any wasteful queries etc. Make sure you do not run any expensive analytical queries against your transactional database.

If the webserver is the likely bottle-neck, profile it to see where it spends most of its time and reduce the work by changing your application or implementing new caching strategies etc. Make sure you are not doing anything that will prevent you from moving from a single server to multiple servers with a load balancer.

If you have taken care of the above, you will be much better prepared for making the move to multiple webservers or database servers. You will be much better informed for deciding whether to scale your database with replication or to switch to a completely different data model etc.

bbadour
Very good, *professional* answer bbadour!
GrandmasterB
@bbadour: Could you please explain this - "Make sure you are not doing anything that will prevent you from moving from a single server to multiple servers with a load balancer"?
understack
@understack: In general, examine whether your application is stateful and where you persist state information if it is.
bbadour
+3  A: 

1) First thing - measure how many requests per second can serve you most-visited pages. For well-written PHP sites on average hardware it must be in 200-400 requests per second range. If you are not there - you have to optimize the code by reducing number of database requests, caching rarely changed data in memcached/shared memory, using PHP accelerator. If you are at some 10-20 requests per second, you need to get rid of your bulky framework.

2) Second - if you are still on Apache2, you have to switch to lighthttpd or nginx+apache2. Personally, I like the second option.

3) Then you move all your static data to separate server or CDN. Make sure it is served with "expires" headers, at least 24 hours.

4) Only after all these things you might start thinking about going to EC2/Hadoop, build multiple servers and balancing the load (nginx would also help you there)

After steps 1-3 you should be able to serve some 10'000'000 hits per day easily.

If you need just 1.5-3 times more, I would go for single more powerfull server (8-16 cores, lots of RAM for caching & database).

With step 4 and multiple servers you are on your way to 0.1-1billion hits per day (but for significantly larger hardware & support expenses).

BarsMonster