views:

37

answers:

1

i forgot the name to a case where a web spider will

first visits all links it sees on the first level. then visits all links it sees on the second level. and so on...

there is a name for this technique....i forgot...

anyways, this is very exhaustive and obviously inefficient. Is there a better way ?

I remember reading a paper in summer about efficiently crawling web pages (DSL or something like that i dont know what that stands for)....in summary it discussed method for "Determine which URL's are likely to hold relevant information and which URL's are to be ignored like register, new account link..etc"

i didnt read it in too much detail, if any of this stuff rings a bell please post a link.

+1  A: 

Sounds like 'breadth first search', as opposed to 'depth first search'. In the 1st one you examine all your options laterally, so to speak, whereas in the latter you drill as deep as you can on every path first. That's AI terminology, not sure if it's in vogue with web tool designers. Anyway, BFS consumes a lot of memory but is usually employed when you want to find an 'optimal result', something (in your terms) at the shallowest level possible, whereas DFS tends to use a lot less memory but may miss better solutions.

If you are just trying to catalog all the links, use DFS. If you are trying to find something at the shallowest link depth, use BFS.

JustJeff
Beat me to it....
Drew Hall