views:

378

answers:

5

Hi,

I'm currently creating a website for a social project in switzerland.

And before there is an overflow of user, I want to prepare the application to scale.

I answered by myself many questions but some are left.

I explain what I want to do.


First

at the beginnning, the Application will have only one server (short time) with DNS, PHP, Mysql, Data, and memcache.


Second

Then I will split them in two

  1. DNS, Mysql, memcache
  2. Data, PHP

Third

Here is the problem, I don't know how to do it exactly here to keep the application running well.

I could do :

  1. Front : Load Balancer, memcache, DNS
  2. Web 1 : PHP, DATA
  3. Web 2 : PHP, DATA
  4. Mysql

This would be the scheme, all PHP sessions are kept in the DB.

BUT, how do I sync the data? do I run a Rsync to keep them up to date. do I put them on a separate disk (network disk) to be sure ? but in this case, how can I do in case of user uploads ?

and if the website gets more success and we have to go on greater structures, would'nt it create some latency on updates ?

or would it be a good thing to go directly to amazon's web services ?

some infos I use codeigniter as Framework. I use linux as webserver (distribution not chosen now, but should be Debian)

Thanks in advance for your answers.

+1  A: 

you should look into hadoop

matei
A: 

Remember you can mount/share folders.

What data would you be syncing?

You might consider putting data on the database machine or other machine. The db machine is usually a good idea at first since it is likely to have greater IO than a regular web server.

It is probably a good idea to setup a SAN or similar so your data stays in one place. Multiple copies of data is a pain to deal with. Going this route means you can put the db files there too.

Byron Whitlock
+6  A: 

According to Wikipedia, Switzerland has 4.6 million German speakers, 1.5 million French speakers, and .5 million speakers of Italian, Romansch and other languages. So I suspect you'll find that a single server will fit your needs. Guess what percentage of the population will visit your site every month or every day to get a sense of how big you can get before running into scaling issues.

So, I don't think you need to worry about scaling yet! Bonus: The time you don't spend worrying about this problem, you can use to solve other problems for your users.

Summer
Agreed, you should always measure performance and then eliminate bottlenecks as you see them. That said, it would still be a good idea to plan for multiple web servers if this site is going to have to scale fast. Once you can scale to two, much of the plumbing will be in place to scale to more...
Justin Ethier
The OP needs to use a cluster FS and get sessions out of the database. How many queries does it take to produce a single page for a logged in user?
Tim Post
+1  A: 

There are a few common paths to scaling web services up, in order of what sites like Flickr and Facebook seem to use:

  • Split servers based on concepts (API, login, media files, ads, static pages, dynamic pages)
  • Split databases based on concepts that don't need to be JOINed (logins, long term reporting, page data, etc.)
  • Compile/optimize your PHP and other resources (sprites, compiled css, zend)
  • Add caching (front end, back end)
  • Add delegation (round robin, etc.)

But, before scaling, measure. Set of tests, calculate your capacity, and don't optimize before you need to.

Bruce Alderson
A: 

I see some questionable things:

  • You have one SQL server, and you are storing sessions in a database on a site where you expect extremely high volume. How many queries does that take to produce a single page if someone is logged in and what is the expected slow down when you eventually employ MySQL replication?

  • If using a cluster FS, everything is 'just kept' in sync. You won't end up with build A on webserver 1 while build B on webserver 2 breaks. If you are really expecting that much traffic, in the time it takes to upload a change, then sync all nodes, you just pissed off a thousand people.

I've deployed apps running on clusters using OCFS2 with over 40 nodes without issue, and OCFS2 is not exactly the 'best' cluster FS available. Check out Lustre and consider keeping sessions on disk.

Tim Post