Are all these types of sites just illegally scraping Google or another search engine?
As far as I can tell ther is no 'legal' way to get this data for a commercial site.. The Yahoo! api ( ) is only for noncommercial use, Yahoo! Boss does not allow automated queries etc.
Any ideas?

+1  A: 

For example, if you wanted to find all the links to Google's homepage, search for


So if you want to find all the inbound links, you can simply traverse your website's tree, and for each item it finds, build a URL. Then query Google for:


And you'll get a collection of all the links that Google has from other websites into your website.

As for the legality of such harvesting, I'm sure it's not-exactly-legal to make a profit from it, but that's never stopped anyone before, has it?

(So I wouldn't bother wondering whether they did it or not. Just assume they do.)