views:

42

answers:

2

In order to show a best match ad each time,there are at least these things to do:

  1. retrieve the main information of the current page
  2. get an ad that's related with the information retrieved above

But the above is almost impossible for a non-search-engine company.

So what's the practical way for a non-google company to approach a best matching ad system?

A: 

You would need the customer to tell you what their page is about when they sign up to have advertisements placed on their site. You're also going to need to be very good at javascript so you can keep track of how many times an advertisement is viewed. Try looking at the code used by existing ad companies. Its very complicated...

Peter
+3  A: 

You basically can't do point 1 in real time -- the time interval is just too short. So you need to analyze beforehand all the pages you're going to be serving ads on, and store that information in a way that it can be rapidly accessed at ad-serving time.

That doesn't necessarily imply "being a search engine company": presumably you're not going to serve ads on billions of different URLs, after all, but only on a far smaller number of URLs that belong to your company or its partners (so you can presumably also get collaboration from the URLs' owners: e.g., you don't need a general spider but can rely on the owners using the sitemaps protocol properly to let you know about new, updated or removed URLs, you can trust each page's keywords , title and headers to provide important info, and so forth).

So with a relatively small number of servers (say a few dozens, maybe in EC2 or other "cloud" service) you can keep an in-memory distributed hash table mapping URLs to (for example) sets of related keywords and weights for keywords' relative importance, and a similar table for candidate ads -- indeed, if you don't have a "real-time auction" aspect to your system, you might even get away with precomputing a URL-to-ads correspondence (presumably you do want to do some dynamic adjustment, auction-wise or other, but with some reasonable approximation that can be modeled as a simple incremental op on the precomputed correspondence).

If you do need to scale to serving ads on billions of URLs, then you do need a far more sophisticated approach than can be effectively summarized on a SO answer -- but then, if that's the scale of your ambition, you had better put together an engineering team that's not daunted by the task (and far more than a few dozen servers;-).

Alex Martelli
Unfortunately we are going to server ads on large scale of urls:(
@user, then you had better budget for a team of at least half a dozen experienced engineers who have done such things before (with another dozen or so bright but inexperienced ones who'll learn by working with them) and a few hundreds or thousands of servers; in view of this kind of budget (several million dollars a year run-rate), you an definitely afford a good consultant, specially skilled and world-class expert in ad serving, to help you select your team -).
Alex Martelli
Alex Martelli
@Alex Martelli,let's keep the discussion in the scope of technique:) We have other people for sales, bd, etc.But just not me,I'm only a developer so I'm here to ask about what a developer should do.
@user, a developer should patch together a small-scale prototype based on the techniques I outlined and measure how its performance and resource requirements scale up as the number of URLs and ads grow, thus providing actual data supporting the need for (unreasonably high) amount of X of hardware resources to scale up to the target system size and latency, and therefore the need for an experienced team of dozens (not _one_!-) developer to move up to enormously more complicated and powerful algorithms and architecture. "No royal road", etc.
Alex Martelli