tags:

views:

31

answers:

1

Hi,

I would like to find all the sites, that have the keyword 'surfing waves' somewhere in their address, very simple! But, without using ANY search engine, which means, writing a pure web-crawler.

The problems,I guess, I will face are:

  1. It will, obviously, never stop to run...
  2. It will come across lots of "garbage" sites before it even hit something that I want.
  3. It will probably run for ages until it finds the first 2000 sites...

Am I right? or in other words, Should I even try to do it this way? I dont want to use search engines because they limit the amount of results.

+1  A: 

Search Engines limit the results in what sense? They are specifically for this purpose. To find things and you should use that. Even if you end up writing your own crawler, that crawler will need some starting points (start urls) to start crawling. May be you can use the search result from Google as that but then again you won't end up with a better result as most of the time (and after pretty long time) you will hit the same urls/addresses that are already part of the search result.

Faisal Feroz
I mean that, if you look on google for the following query: inurl:surfing , it will give you UP TO 1000 results, and that applies for any other query, although they find billions of results, they let you see only the first most relevant 1000, thats not good enough for me.
soulSurfer2010