A client of mine who is a social sciences researcher at a university is asking if I can write a spider to do statistical data mining from a subscription-only academic database. He would like to use the statistics for his academic research.
(For those interested, this would involve downloading thousands of text documents and then doing linguistic analyses to look for the frequency of certain words and phrases to test how language is used. The documents themselves would not be republished or reproduced in any way.)
I am trying to determine whether this type of work is generally considered permissible (e.g. fair use). The website's terms of service do not appear to specifically prohibit screen scraping. When I get a chance I will ask a friend who is a lawyer, but in the meanwhile, does anyone have pointers to information on when this kind of data mining work is considered fair use?
(This question was relevant and answered part of my question; I am looking for more information specifically on data mining without republication.)