I have been thinking quite a bit here lately about screen scraping and what a task it can be. So I pose the following question.
Would you as a site developer expose simple APIs to prevent users from screen scraping, such as JSON results?
These results could then implement caching, and they are much smaller for traffic than the huge amounts of markup that could potentially be downloaded.
I am not looking at prevention, but deterring scraping.
Scraping Bandwidth Sample
((users * (% / 100)) * ((freq * 60) * 24)) * filesize
- users: 200,000
- % of users using utility: 5
- filesize: 1kb
- freq: 1 minute
Formula:
((users * (% / 100)) * ((freq * 60) * 24)) * filesize
10,000 * 1440 * 1
14400000kb or 13.73291015625gb
Assuming your JSON result is 200 bytes that's now (10,000 * 1440 * 0.2) or 2.74658203125gb a day.
That's a change of about 11gb of traffic a day.
My Stack Overflow profile is 96k for reference.
The reason for this question prompted asking for a JSON result from users profiles:
http://stackoverflow.uservoice.com/pages/general/suggestions/101342-add-json-for-user-information
I wanted to find out if other developers would expose this type of API, and if it is worth your time to provide these APIs to reduce bandwidth.