views:

192

answers:

5

I am planning to build an application that will get a large amount of traffic. (Please don't say I won't get traffic, this is for an internal network, so the traffic will be there. Just trying to avoid the 'You won't get that much traffic, don't worry about it.)

As for what type of traffic I'm expecting, users will browse various dynamically created (based on user account details). On those sites the user may submit text inputs. Both loading the pages and handling user input will hit the database. Loads will obviously be reads, but handling input will require both reads & writes. Inputs may also affect other users views. If this happens, I will need to notify the other users to refresh the page.

What sorts of things do I need to do so that it doesn't simply crash under the load of a large amount of users?

What becomes the limiting factors? Database stuff? I/O with front end?

I've never really developed a serious web app before and am looking for some help.

EDIT: I was considering using Erlang for the backend since I've used it a little bit and really like all the concurrency stuff. Would this be a viable choice or should I try for something more traditional?

A: 

just don't do more than you need to. if you hold to that, you can handle most things short of metablog effects.

sreservoir
+13  A: 

This is a very big topic, and you'll probably want to do as much research as time allows. There are several big topics to consider.

  1. Session state storage. Obviously, session storage takes up memory or disk space. You need to have a strategy to store session information properly and in a way that can be used by a web farm.

  2. Caching. A robust caching strategy can reduce loads dramatically. Do lots of research as to when, what and where you should be caching.

  3. Scalability and load testing. Extra thought has to go into each resource fetching operation to make sure that it's being done as few times as necessary. Load testing and code profiling can help identify bottlenecks here if you use good tools.

  4. Database optimization. Make sure you understand how to properly optimize your database for thousands (millions?) of operations per minute. If your application is write-heavy, you may need to look at warehousing older data that doesn't need to be included in indexes anymore to speed up your write operations.

  5. Upgrade path. Is your traffic going to ramp up over time? Be sure to understand how you would plug in more servers and memory to your application if/when it's needed, and what would be required.

There are lots of books around that you could invest in that would probably pay off in big dividends. Do a search for "building scalable web applications" at amazon or chapters and you'll probably find lots of texts to go on, both technology specific and agnostic.

womp
Thanks! That provides some stuff to look at.
samoz
If you're using ASP.Net or ASP.Net MVC as your technology stack, the MSDN site has lots of articles that can get you started with all these concepts for those technologies.
womp
A good tip on #2 is to use a caching proxy where possible, e.g. Varnish or Squid. That can dramatically reduce the load on your app servers since they won't have to regenerate pages for each visitor.
Martin
+1  A: 

In addition to everything else mentioned here, you should be looking at the timing of your traffic. Is it relatively constant over time? Or does it come in bursts, where you'll get a much higher amount of traffic in a short period of time?

By and large, you'll want to design a system that can handle the peak loads gracefully (though not necessarily at the ideal performance level). If your traffic is very bursty then you'll have to devote more effort to making it scale than you would if you got the same amount of traffic gradually.

Craig Walker
+1  A: 

As far as Erlang goes: it sounds like an acceptably good language (based on the little I know about it), but it is certainly not a magic wand that gives you scalability. There's dozens of different factors and products to consider. Language choice is but one of them... and probably one of the least significant ones.

You may be better of going with what you already know & learning how to make it scale, rather than going to a new/unknown technology and hoping that it scales for you.

Craig Walker
Well I come from primarily an embedded or system level C background, so I can pretty much start wherever. I liked looking at Erlang because it's a functional language but also because of the high concurrency aspects that it has touted.
samoz
+1  A: 

Backend storage, database handling, front-end dynamic content, and caching is one thing. Consideration of your host service provider and available network bandwidth is the other.

Check with your hosting service on their bandwidth caps, max memory allocation per request, max file upload sizes, and max database queries. If your current host doesn't offer cheap services that match your scaling requirements, then move to another host before you're either shut down or caught offguard by a triple-digit monthly bill for going over your alloted bandwidth.

Edit: just re-read and caught your "internal network" reference. So, in this case, you probably won't be stuck with a several hundred dollar bill by your network admin, but they can still shut you down. Be sure to keep the lines of communication open with your network admins and admins of any other services your own site interacts with, or you'll likely make enemies of them all pretty quickly. In other words: good network etiquette.

Furthermore, if you actually own and build the server, make sure the OS, software stack, and hardware are all up to date with stable software and firmware versions only, able to handle the load, and monitored to run smoothly at all times.

Edit #2: I know you asked specifically how your application can handle the load, and I may just be ranting off-topic here, but you also have to consider whether you and your team mates can handle the load. Manpower bandwidth is just as important, and getting discouraged by the work load is how projects like this fail. Beer is a programmer's best friend, especially when tackling complex and creative programming tasks, but it can lead to serious drinking problems if manpower isn't managed correctly or if manpower resources are lacking. Who's going to respond to that outage notification at 3 in the morning? Who's going to respond to hatemail from religious fundamentalists or trolls, or crawl through law and patents to verify if that take-down notice is bogus? Unless it's a gig that can pay the bills, likely most folks can't devote a lot of time and energy. I don't mean to discourage you at all, and hopefully you got this covered already.

bob-the-destroyer
Thanks for the advice. If we move this to the internet (may happen eventually if this pilot is successful), I will remember to look into that.
samoz