views:

473

answers:

7

My question is of the chicken and egg variety -- I want to get a job working on high-traffic, highly scalable websites, but I don't have much experience with high-volume sites.

I've worked on dozens of web applications in my career spanning almost a decade. Most of these projects have been smaller, intranet systems or public websites that get a fewer than 1000 hits a day.

How can I gain experience building high-volume, highly-scalable websites? I've read books and articles, but that goes only so far. I can't simulate a distributed, load-balanced environment at home because I don't have the resources to get a cluster of servers.

+1  A: 

For me, it was just a matter of happening to work some place that had exlposive growth. So it was just plain luck.

Diodeus
+2  A: 

You do have those resources; you probably just don't know it: Windows Server 2008 + Hyper-V or Windows Server 2003 with Virtual Server added on will allow you to simulate exactly that.

This is how I normally work (although I have spare physical machines I can call on when I need to). I've got an entire simulated network (complete with 3 web servers, a directory controller, a sharepoint server, an exchange server etc) all running off of a single Mac mini.

As for how to start with high-volume websites... your best bet is probably to find a smaller client that needs one of these. They are not willing to pay as much money as larger clients, but they also don't expect developers with reams of experience either. Or, latch onto a client that's going to eventually want to scale something that you've built for them.

DannySmurf
With this setup, what do you use as a load balancer? i.e., How do you direct traffic to each of the three virtual web servers?
frankadelic
I normally let the operating system handle that in my own (Windows-based) web projects. Windows Server NLB is a pain, but it works well. If you're looking for hyper-scalable stuff (millions of concurrent users) this setup won't work for you. But for "normal" projects it all works well.
DannySmurf
+2  A: 

Roll your own load :)

Apache JMeter

http://jakarta.apache.org/jmeter/

Paul Whelan
+4  A: 

HIGH VOLUME WEBSITES? JUST USE CAPS LOCK!

Robert Grant
Hehe, I knew I'd get some downvotes. Totally worth it, though.
Robert Grant
I thought it was funny :)
Nic Wise
+2  A: 

It's good to practice and test with artificial load, but the truth is it only goes so far. Web apps choke for many reasons:

  • A bottleneck at the database
  • Bad/sub-optimal server configuration
  • Memory stress at any tier (Ex: local session/cache on a web server)
  • Network latency between any tier
  • Huge payload to the client (external scripts, css, images, etc... as well as page size)
  • Processing stress at any tier (Ex: web server compression shrinks payload but adds demands to request processing)
  • I/O stress at any tier
  • etc...

A load tester will expose the first problem from the application you're testing, but the fixes you apply may do nothing for another app. Aside from the tips you've gathered from books and articles, scaling and performance tuning is more of a per-instance black art than a science. I'd say the most important skills/traits are persistence, curiosity, and endurance - in that order.

Corbin March
A: 

Contribute to a popular open source project? Either that or somehow get lucky with a personal project and hope it gets popular.

There is a different mindset on sites like this and some things didn't seem important to me until I got bit in the ass by it - altho I've only tasted a level of it. I myself am interested in other solutions posted here.

BPAndrew
+1  A: 

I think you have three choices:

  1. Join a big company which has one or more of them. Eg I'm working at the BBC, which is a fairly large site :)

  2. Join a startup which has one of them. Digg would have been good, but they just layed off 10% of staff. But you get the idea.

  3. Start your own startup. Also, pray you have it right :)

I'd prefer 2, then 3, then 1, but hey - I'm here now :)

Other than that - VM's. If you are a linux head, Xen, VMWare or similar and 1-2 medium-powered machines will get you a nice setup. If you have time and some cash, the likes of Amazon AWS will allow you to spin up a BIG setup without a massive cost, but make sure you price it first.

I'd start with VM's tho. Time everything. Know how long it takes with a cache hit, and without. If you have low power hardware, make the cache really small so it expires quickly and puts more realistic load on the boxes etc.

VM's FTW!

Nic Wise