views:

139

answers:

3

If you had been given the task to recommend an open source solution stack for a web site where scalable high performance is prioritized above everything else, what would you recommend?

Some attempts at definitions:

  • Scalability: when deployed, the solution stack should be able to scale from a handful of users to several millions by simply adding more hardware. The solution stack should never be the bottleneck.

  • Performance: imagine a web site that can handle millions of users, hundreds of terabytes of data (HTML, images, videos, etc), tons of traffic every day, atleast 99.99% uptime per year.

The solution stack should atleast specify:

  • OS
  • web server
  • database or equivalent
  • programming language

You can also suggest other programs, libraries, tools, etc (e.g. language specific libraries, load balancers, caching) if needed.

+2  A: 

I probably wouldn't try to reinvent the wheel and make my own solution, I'd see how other successful sites have done it. A great place to start would be High Scalability. They document how some of the most successful sites came to be and how they manage their massive infrastructure.

John T
+1  A: 

I have a recommendation for a webserver for you:

nginx [engine x]

On its website and also in other corners of the web you can gather more information about it, including popular websites using it. A very good example is Hulu.com, dealing with a heavy load of video streaming. It is said to work better than lighttp, the typical competitor of Apache when it comes to raw performance.

An additional interesting aspect of it is that it already includes load balancing.

ypnos
nginx is not only really fast; but also has a nice interface to memcached, the other cornerstone of high-performance opensource web
Javier
also, nginx is more a front-end load balancer; but it can serve static pages really fast. and its backends can be either http or FastCGI, so you can put any webappserver behind it, be it written in PHP, Python Perl, Java, Lua, etc.
Javier
+2  A: 

There are a number of key areas when designing a high performance and scalable site:

  1. caching
  2. Disk IO and location
  3. Database locking
  4. Did I mention caching?

It will be impossible for anyone to give you a complete solution on Stackoverflow. You need to sit down and determine what content types you are going to deliver to users, how often that content is going to change and where you can store it.

For caching of content you should look at: Squid, Apache's mod_cache and memcached

Physical disks should be considered. If you scale your solution by having more than one web server then will you share one copy of your content (videos, images etc) or will you have one copy for each server? If you share one copy then beware of IO on that single disk. If you have one copy of the content for each server then you need to keeps the copies in sync.

Database usage should be kept to a minimum. Never, ever store graphics in a database or other content that can be kept in a flat file on disk - web servers do a great job of serving files from disk but databases aren't so good for that. Think what you need to put in your database and how often that data is going to change and be read. When do you need to lock that database? 9 times out of 10 the database is the bottleneck in a system.

Cache. Cache. Cache. Look at delivering as much static content as possible. Build your webpage HTML once and then store it as a cached file - either on disk or in memcached or similar.

To answer some of your technology questions directly:

Web server is a given: Apache Httpd. Not the fastest out there but bullet proof and high configurable.

OS: Your OS is never likely to be your bottleneck so choose something stable and well supported - CentOS works well.

DB: Your obvious choices are mySQL and Postgres. Postgres has better performance of the two, but as I said before you must look to keep your DB activity to a minimum.

Language: It doesn't matter. Seriously. You can create a scalable, well performing site in any of Python, Ruby, PHP, Java, .NET, etc. Your language will not be your bottleneck.

Steve Claridge