i forgot the name to a case where a web spider will
first visits all links it sees on the first level. then visits all links it sees on the second level. and so on...
there is a name for this technique....i forgot...
anyways, this is very exhaustive and obviously inefficient. Is there a better way ?
I remember reading a paper in summer about efficiently crawling web pages (DSL or something like that i dont know what that stands for)....in summary it discussed method for "Determine which URL's are likely to hold relevant information and which URL's are to be ignored like register, new account link..etc"
i didnt read it in too much detail, if any of this stuff rings a bell please post a link.