views:

28

answers:

2

Facebook's developer principles and policies and the general terms of use seem to forbid automated data collection, but graph.facebook.com/robots.txt seems to allow it:

User-agent: *
Disallow:

Does anybody know how to make sense of this?

+1  A: 

They don't want you to scrape their data, but they want Google to index the site.

mipadi
That is definitely the case for the root facebook.com/robots.txt, which lists the big dawgs that are allowed to scrape facebook.com itself.But the robots.txt for GRAPH.facebook.com allows anyone.
jpadvo
`graph.facebook.com` is just developer docs, isn't it? In which case, there's no valuable data to scrape anyway. Or perhaps some webmaster forgot to tweak the robots file. :)
mipadi
@mipadi - actually, not. The root page redirects to the developer docs, but the children pages point to data about various Facebook objects, for example - https://graph.facebook.com/19292868552.
Franci Penov
It's a lot more than that! Here's an example: http://graph.facebook.com/cocacola/posts
jpadvo
Here are the docs for it: http://developers.facebook.com/docs/api
jpadvo
A: 

Terms of Use trump robots.txt. Just because they have not taken the measure of preventing you from doing something, does not mean you are allowed to do it.

Yes, they could change robots.txt to prevent crawling of graph.facebook.com. However, that means they for any company they want to allow them access, they'll have to make an exception in that file, which would serve as potential disclosure of their business deals.

On the other hand, they could actually generate that file on the fly and actually return different robots.txt for agents that identify themselves as coming from a company they have private deal with. Not sure it's worth it, though; sometimes establishing a policy is cheaper and more effective than coming up with a technical solution to enforce it.

Franci Penov