tags:

views:

78

answers:

4

Hi all,

I'd like to write a little software which 'visits' a specific website regularly (every minute, for example) and gets specific data from there. This data is stored in a database which is used by another software I'm planning to write.

Is this legal or not? Do I need the permission from the website owner?

It's a complete open website, the data I collect is 100% visible to all other users visiting that website...

Thanks a lot for your answer,

Stefan

+2  A: 

I am not a lawyer, but it will depend on the site and their copyright/terms of use.

Whether it's legal or not, if you are hitting someone's site every minute, it would be polite to contact the site owner and let them know of your intentions. After all, if it were to start impacting their site, they could easily block requests from your software.

mopoke
+2  A: 

If you are going to do it that often, ask the site owner first. It's their server resources and bandwidth you are using up.

Usually when running a crawler you use an identifiable user-agent string in your robot, and show respect for the robots.txt file.

Emil Vikström
thx a lot for your answers! I would show respect to their robots.txt, but I don't find anything at http://www.domain.com/robots.txt...
swalkner
If they didn't publish any robots.txt, of course you can't parse that file. But check back regularly (schedule an automatic job every day, maybe?).
Emil Vikström
+1  A: 

It's legal. It's also legal for them to block your script and put you out of business if they don't like it.

Azeem.Butt
+2  A: 

It's not illegal but it may be against the web site's terms of service and they may choose to ban your bot. For example, automated retrieval of search result pages is forbidden by Google's TOS.

There are some conventions for friendly bots that you should be aware of. If you're hitting a site with a bot, you should first download /robots.txt and parse it to see what urls are restricted. Also, don't hit the site in bursts. Space out your requests by seconds or minutes. Use gzip compression so you don't waste their bandwidth.

Asaph