views:

356

answers:

2

Hey guys,

I was wondering if any of you know how to implement a backend system that will retrieve SEO information from Google (Website Ranking, #ocurrences in the first X results in google, etc).

I know Google AJAX Search API (code.google.com/apis/ajaxsearch/) will allow you to retrieve the content without having to "wget" or "curl", but using the search information doesn't seem to be legal (code.google.com/apis/ajaxsearch/terms.html).

Any ideas on how to implement this?

Thanks!! Fer Martin

A: 

There is http://toolbarqueries.google.com which is what the Google toolbar uses (or used to use) to obtain the PageRank for a link. It can be easily queried by hashing first the URL to check in a specified format.

AFAIK it's an undocumented API and as such it is not clear the legal implications of using it.

regards, DrSlump

+1  A: 

I have investigated how to go about doing this with Google, and AFAICT there's really no way to do it legally. Since their cash-cow are those SERPs, they don't allow anyone to scrape them for any reason.

There are a slew of services out there which will scrape Google for you, but from what I can tell, they are all doing it against Google's TOS. If you figure out a way to do this legally, let me know. I'm guessing there are a few who scrape with granted permission, but I'm unsure who they are.

The only ideas I've had so far are:

  • Set up a "proxy server" which is used to automate customer Google queries. The proxy can then see the results and do the scraping, and it's not "automated." If the user enters 20 terms, then open 20 frames which do the search via the proxy server.
  • Piggyback on web traffic coming to a site. In short: I visit your site, and a background JavaScript call searches Google and posts the results to your site. This is unethical, as I may wonder why "your" searches appear in my Google history.

The issue is "automated." I have a feeling that those services which do this actually have farms of computers around the world to make it appear (to Google) that they are not being scraped. I'm guessing that unless you start generating some serious traffic from a single IP, you'll be fine for a while.

Perhaps you should just ask permission?

razzed