I have a serious question. I'm not trying to start a flamewar or to incite any violence--but here goes.
Is it ever ethical to ignore the presence of a robots.txt file on a website? These are some of the considerations I've got in mind:
1.) If someone puts a web site up they're expecting some visits. Granted, web crawlers are using bandwidth without clicking on ads that may support the site but the site owner is putting their site on the web, right, so how reasonable is it for them to expect that they'll never get visited by a bot?
2.) Some sites apparently use a robots.txt exactly in order to keep their site from being crawled by Google or some other utility that might grab prices and therefore allow people to do price comparisons easily. They have private search engines on the site so they obviously want people to be able to search the site; apparently they just don't want people to be able to easily compare their information with other vendors.
As I said, I'm not trying to be argumentative and I'm not trying to start a big argument; I would just like to know if anyone has ever come up with a case where it's ethically permissible to ignore the presence of a robots.txt file? I cannot think of a case where it's permissible to ignore the robots.txt mainly because people (or businesses) are paying money to put up their web sites so they should be able to tell the Googles/Yahoos/Other SE's of the world that they don't want to be on their indices.
To put this discussion in context, I'd like to create a price comparison website and one of the major vendors has a robots.txt that basically prevents anyone from grabbing their prices. I'd like to be able to get their information but, as I said, I can't justify simply ignoring the wishes of the site owner.
I am making this a community wiki question exactly because I believe it might generate some spirited debate. Or maybe not.
I suspect this discussion may belong elsewhere and if it does, just let me know. I have seen some very sharp discussion here and that's why I would like to hear the opinions of developers that follow Stack Overflow.
By the way, there is some discussion of this topic on a Hacker News question but they seem to mainly focus on the legal aspects of this.