views:

143

answers:

2

I have found a post http://stackoverflow.com/questions/999056/ethics-of-robots-txt/999088#999088 discussing a matter of robots.txt on web sites. Generally, I agree with the principals. However, there are commercial tools checking Google positions by - very likely - scraping Google for results, due to lack of API (in case someone doesn't know there used to be one).

Google's robots.txt disallow user agents to /search which means they don't want you to do this. So I am confused. Since there is no other way - at least as far as I know and many people (and companies) do this. Is this allowed or not?


Google TOS says:

5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google.

However, I reckon RichieHindle must be right because the number of SEO tools that are using scraping techniques are endless.

+1  A: 

It's not allowed, but Google tolerate it up to a point. They reserve the right to ban your IP address if they detect outrageous scraping behaviour, but a reasonable level is overlooked.

RichieHindle
+1  A: 

robots.txt is just a file that says do and don't look at these pages. Good robots read and abide by it, bad robots ignore it. A bit like "Don't walk on the grass" signs.

However, at the end of the day, the author can't stop the robot from doing what the hell it likes, except block it's IP. Normally this wont happen unless the robot is the type that sits and uses enough of the websites bandwidth to annoy the owners of the site enough to block it.

If it had a similar request pattern to a single human user, then I doubt most site owners (including google) would care which page it reads.

As your tag says "legal", I can tell you that a robot ignoring a robot.txt file is -not- illegal, but might be against the ToS of the website if theres a section on robots, which may result (as said above) in an ban on the service. But that is about as far as it goes.

Of course this is my opinion on the situation, don't take it as fact, do some reading around see if there are actual factual answers somewhere.

Sekhat
Thanks for your sharing your views. Reading around is what I'm doing now... that is how I've come accross the above mentioned post. There seem to be no factual answer to this question apart from the TOS.
G Berdal