views:

1170

answers:

4

How does the fair use doctrine apply to websites in terms of screen-scraping?

The particular example I am thinking of is extraction of the useful data from a website, and re-presentation of the raw data aggregated with data from other similar websites. For example, suppose one was to extract data from a variety of websites to produce a database of structured data in a similar manner to DBpedia, could this be considered fair-use. If not, where does the boundary lie?

Obviously, this differs by jurisdiction, but then so the websites being scraped as well as the location of the scraper. Similarly, ethics may well differ from legalities.


Okay, so in many cases it may well be illegal, but as a content-provider how can it actually be prevented? I've posted a follow-up question on Protection from screen scraping

+1  A: 

Ahnn... Google?

Otávio Décio
Indexing for search and providing links to drive traffic to sites is quite a bit different than re-presenting content scraped from a site.
Bill the Lizard
Fair enough. However we don't quite know what "else" Google does with the data...
Otávio Décio
Doesn't matter as long as it does it privately
Vinko Vrsalovic
Interesting point. I wonder if is that the same as someone downloading sw, mp3, etc for private use?
Otávio Décio
Although I don't believe the OP meant it, the google point is interesting because a couple of suicidally trollish companies are certainly taking the view that google *is* stealing their content, the reality that their traffic would be decimated without google notwithstanding
annakata
@annakata: I wonder if those companies understand what they're doing?
Bill the Lizard
+3  A: 

It depends on the terms of service of the site being scraped. Some sites, like Google and Amazon, expressly forbid what you're describing. Instead they provide an API for you to use, which you have to sign up for. Most sites frown on the type of behavior that you're describing because you take their content without displaying their ads.

Bill the Lizard
+4  A: 

The problem with what you describe is that different sites have different rules regarding their copyright. If you scrape copyrighted information you can find yourself in hot water very quickly.

As I understand it, the US fair use rules allow for limited use of copyrighted information without the copyright owner's permission. The key word here is "limited". Wholesale scraping and republishing of content is almost definitely against the law in the US and certainly in other countries. Taking a quotation or two and including it within a much larger work of your own origin is more acceptable.

It is also important that you do not use the material in a commercial way. So taking other people's content and putting it alongside advertising is probably not permitted.

The biggest problem I have with these types of "fair use" is that they often take away the value of the original work (I believe this is also taken into consideration in fair-use law). I am not suggesting that this is your goal but when people start distributing other people's work, the limited advertising revenue they may receive disappears and means that it is potentially no longer viable to write the content in the first place.

  • I use the word probably a lot here because I am not a lawyer - don't take my opinion as legal advice.
BlackWasp
+2  A: 

As a rule of thumb, at the point where you take someone else's content, and commercially exploit that content to the detriment or exclusion of their page impressions/sales/ad revenue/bottom line, people are going to get annoyed.

seanb