tags:

views:

109

answers:

1

HUsing the regular search engine, as a human, can get you not more than 1000 results, which is far more than a regular person needs.

But what If I do want to get 2000? is it possible? I read that it is possible using the App Engine or something like that (over here...), but, is it possible, somehow, to do it through Perl?

+3  A: 

I don't know a way around this limit, other than to use a series of refined searches versus one general search.

For example instead of just "Tim Medora", I might search for myself by:

Search #1: "Tim Medora Phoenix"

Search #2: "Tim Medora Boston"

Search #3: "Tim Medora Canada"

However, if you are trying to use Google to search a particular site, you may be able to read that site's Google sitemaps.

For example, www.linkedin.com exposes all 80 million+ users/businesses via a series of nested sitemap XML files: http://www.linkedin.com/sitemap.xml.

Using this method, you can crawl a specific site quite easily with your own search algorithm if they have good Google sitemaps.

Of course, I am in no way suggesting that you exploit a sitemap for illegal/unfriendly purposes.

Tim
thanks, that sitemap.xml thing is new to me, and very useful. I dont get why linkein should have this file on their server??
soulSurfer2010
LinkedIn uses the sitemap.xml file to tell Google that each of those pages exist. A crawler can only find web pages that have been linked to something else, and on a large site like LinkedIn, not all pages are directly linked to anything. The sitemap tells the Google crawler explicitly where to go for each page. Google uses a combination of sitemap.xml files and ad hoc crawls to build its database. Just look for "google sitemaps" on Google and you can learn what sitemaps are, how to create them, and how to read them.
Tim
WOW, that is a fascinating thing, Thanks a lot for the help!
soulSurfer2010