views:

182

answers:

5

We have a new project for a web app that will display banners ads on websites (as a network) and our estimate is for it to handle 20 to 40 billion impressions a month.

Our current language is in ASP...but are moving to PHP. Does PHP 5 has its limit with scaling web application? Or, should I have our team invest in picking up JSP?

Or, is it a matter of the app server and/or DB? We plan to use Oracle 10g as the database.

+1  A: 

I think that it is not matter of language, but it can be be a matter of database speed as CPU processing speed. Have you considered a web farm? In this way you can have more than one machine serving your application. There are some ways to implement this solution. You can start with two server and add more server as the app request more processing volume.

In other point, Oracle 10g is a very good database server, in my humble opinion you only need a stand alone Oracle server to commit the volume of request. Remember that a SQL server is faster as the people request more or less the same things each time and it happens in web application if you plan your database schema carefully.

You also have to check all the Ad Server application solutions and there are a very good ones, just try Google with "Open Source AD servers".

backslash17
+6  A: 

You do realize that 40 billion per month is roughly 15,500 per second, right?

Scaling isn't going to be your problem - infrastructure period is going to be your problem. No matter what technology stack you choose, you are going to need an enormous amount of hardware - as others have said in the form of a farm or cloud.

Peter Bailey
Lets say that each banner ad is 20k in size and never cached. That's over 300MB a second you need to be able to serve.
Ólafur Waage
You'll actually only need a enough to serve as origin for a CDN.
Jason Watkins
+2  A: 

This question (and the entire subject) is a bit subjective. You can write a dog slow program in any language, and host it on anything.

I think your best bet is to see how your current implementation works under load. Maybe just a few tweaks will make things work for you - but changing your underlying framework seems a bit much.

That being said - your infrastructure team will also have to be involved as it seems you have some serious load requirements.

Good luck!

rifferte
+1  A: 

PHP will be capable of serving your needs. However, as others have said, your first limits will be your network infrastructure.

But your second limits will be writing scalable code. You will need good abstraction and isolation so that resources can easily be added at any level. Things like a fast data-object mapper, multiple data caching mechanisms, separate configuration files, and so on.

staticsan
+5  A: 

No offense, but I strongly suspect you're vastly overestimating how many impressions you'll serve.

That said:

PHP or other languages used in the application tier really have little to do with scalability. Since the application tier delegates it's state to the database or equivalent, it's straightforward to add as much capacity as you need behind appropriate load balancing. Choice of language does influence per server efficiency and hence costs, but that's different than scalability.

It's scaling the state/data storage that gets more complicated.

For your app, you have three basic jobs:

  1. what ad do we show?
  2. serving the add
  3. logging the impression

Each of these will require thought and likely different tools.

The second, serving the add, is most simple: use a CDN. If you actually serve the volume you claim, you should be able to negotiate favorable rates.

Deciding which ad to show is going to be very specific to your network. It may be as simple as reading a few rows from a database that give ad placements for a given property for a given calendar period. Or it may be complex contextual advertising like google. Assuming it's more the former, and that the database of placements is small, then this is the simple task of scaling database reads. You can use replication trees or alternately a caching layer like memcached.

The last will ultimately be the most difficult: how to scale the writes. A common approach would be to still use databases, but to adopt a sharding scaling strategy. More exotic options might be to use a key/value store supporting counter instructions, such as Redis, or a scalable OLAP database such as Vertica.

All of the above assumes that you're able to secure data center space and network provisioning capable of serving this load, which is not trivial at the numbers you're talking.

Jason Watkins