views:

27

answers:

2

I would like to know as a newbie programmer what the benefits are of using for example google search API or newest buzz API for data content gathering instead of screen scraping; obviously apart from the legal aspects.

+1  A: 

API's are less likely to change than a screen layout.

Gilbert Le Blanc
Thank you for your reply. When you say change, what do you mean?From what I have read I would be required to get an account; then create some script that accesses the API, which I am still unclear how to do since the google buzz API documentation does not help much, and create queries to post it to the API. The results I get back would be what I would normally see on the screen right in the manual search? so would I be able to have script run these queries periodically?
vbNewbie
He means that the page layout for Google Buzz (just an example) could be modified so that your scraper wouldn't work.However, API's are usually left in tact, as they are made to be used by programmers to interact with the service, and it would do them no good to break all preexisting applications.
Precision
Thank you Precision. Are the google APIs only accessible with java code and how exactly would I access for example google buzz API. if this is the site address http://code.google.com/apis/buzz/ how would I access this from code and I do not have a website
vbNewbie
What Precision said. :-)
Gilbert Le Blanc
If I understand what Google is saying, right now, you can only access Buzz with an RSS reader. In the future, Google may add API's that a programmer could use.
Gilbert Le Blanc
Thank you everyone; from what I have read I will basically use the REST functionality and authorization service requests which are basically url format. So this typically will just involve using httpwebrequest etc to go through the process. Thanks again everyone.
vbNewbie
+1  A: 

One big downside of screen scraping is that the screen can change and break your scraper. So you end up having to continually adjust your code to match theirs, and since you don't know about changes ahead of time, you suffer downtime/outages as a result.
Also, you may be violating their TOS, and they won't like it. If you have paying customers for your service, you can find yourself between a rock and a hard place pretty quickly. Also, if you're simulating many users, you'll produce an unanticipated drag on the servers. So using a published/permitted API would be much more efficient for you, and for the web site serving up the source material.

Chris Thornton
So if I understand correctly as well, accessing googles API only involves having a google account which these days they verify by sending codes to a phone. Does the API service return the same amount of results back as in a normal search ie. 1000.?
vbNewbie