views:

116

answers:

4

First question on stackoverflow. I have no previous experience of running a high traffic website and I would consider myself somewhere in between a novice and an intermediate programmer....please be gentle :)
I am trying to make a social website that I ultimately hope will handle a lot of traffic and users. However, I don't know if the concept will fly and programming for scalability is a lot of additional work compared to slapping some sloppy code together that functionally works the same way. In addition, since I'm relatively uninformed about programming for high scalability, I find myself doing a lot of research which is further slowing me down (highscalability.com is amazing...I'm currently trying to figure out offline queues)

My question is, should I:
A)
1. put together some code that's suboptimal but functional (somewhat sloppy code, excessive database queries, no caches, etc.)
2. work on gathering traffic
3. rewrite and restructure code

or B)
1. fully research scalable designs and apply from the beginning so I don't have to restructure much
2. work on gathering traffic

Any advice is appreciated, thank you.

+3  A: 

I would go for option A. It's much harder to generate traffic to a website than it is to improve performance. If your idea is unique then time to market should be your primary goal. http://highscalability.com/ contains tons of good articles on how others have solved scalability problems.

Kane
+1  A: 

You make it sound like A) would result in sloppy, poorly thought-out code that works, but will not scale well and is almost certainly going to require a rewrite once you already have users and need to provide reasonable uptime. Fixing prevantable problems once you already have traffic sounds like a nightmare.

I would definitely go with B). Thinking about, researching and planning the architecture of your application, not just for optimisation or performance but also just for sensible overall design, is an absolute must for any non-trivial software application.

There is a common myth that premature optimisation is the root of all evil. This is absolutely false, though it would be more accurate to say that unnecessary optimisation is the root of all evil. Do not make the newbie mistake of optimising where it doesn't matter, which is just going to mess up your code, but do spend the time finding out which optimisations DO matter.

Twitter nearly died when they realised they'd made some poor DB design choices once they already had traffic.

thomasrutter
Twitter went with A and it nearly killed them. Perhaps B did kill their competition.
Nosredna
Possibly. Kind of a catch 22 then I guess - either get traffic quickly and not be able to deal with it and possibly die as a result, or take longer to enter the market, and possibly die from burning through your budget before you get any traffic, or being beaten by Twitter. Let's just say that Google thought long and hard about their architecture and scalability and still beat the rest of the market in the longer term.
thomasrutter
+4  A: 

Web development is a continual process. We may think we know what we want at the beginning, but it will inevitably change by the time we get there.

I suggest that you start by getting the book by the 37 Signals Crew -- Getting Real: http://gettingreal.37signals.com/

Mix A and B. Try to get a good hosting situation. Think about ways that you can cache (memcache -- it's easy). Write clear code, but don't spend too much time...

"Release Early, Release Often".

--

Here is a tale of two projects.

  1. Developed on side time -- hacked together -- released early (and often). It grew to 15,000 users and 6,000,000 views per month (in 5 months).
  2. Developed in a corporate "do-it-all-just-right" mentality. Took 4 months, 10+ people, tens of thousands of dollars. It peaked out at ~ 100 users.

Let wisdom guide you...

gahooa
after having read half the book (which is surprisingly free) I'm going to have to go with this answer. Thanks!
Glitz
+2  A: 

Spend a couple weeks studying Cal Henderson's Building Scalable Web Sites, Theo Schlossnagle's Scalable Internet Architectures, and of course the site you've already found, Todd Hoff's excellent highscalability.com. At a minimum you'll understand the tradeoffs between (A) and (B) and be able to make a better decision.

Also spend time looking at Amazon Web Services, especially their EC2 (Elastic Cloud Computing) and S3 (Simple Storage System). A group at my company just deployed a web application on the Amazon infrastructure and it was dramatically simpler than trying to run it on their own physical hardware.

If you're still at an early ideation stage and just want to work out your ideas and run small experiments, (A) would work well. But once you decide you want to deploy a small-scale trial leading into a full scale product, you absolutely need to follow (B).

When you start to shift into (B) mode, I'd suggest you use AWS to save nearly all the effort and capital expenditure in setting up your own infrastructure. Use some of the time you'll save using AWS to thoroughly learn (B) and apply the lessons. Then if you succeed, your scalable architecture will allow you to rent as many AWS machine-hours as you need. If you don't succeed, you'll have learned a lot of very useful things to apply to your next startup idea (or job).

Keep in mind this isn't an either-or choice too. Once you understand the basic principles behind scaling, you'll be able to start out along path (B) with something simple, while at the same time have the comfort in knowing how you'll progress to the next step. Danga has some very interesting presentations along these lines. Take a look at this one, and you'll see how they started off with just one machine, shifted to an app server machine and a database machine, to three app servers and a database machine, and so.

Jim Ferrans
Thank you for the wealth of information that I would not have found otherwise (particularly the danga presentations.) While I probably won't read all of that beforehand, they are all excellent resources that I'll definitely get around to.
Glitz