views:

664

answers:

3

is there a good web crawler library available for PHP or Ruby? a library that can do it depth first or breadth first... and handle the links even when href="../relative_path.html" and base url is used.

+2  A: 

Check this page out for a Ruby library: Ruby Mechanize

I'd like to mention that you would still be responsible for the way in which your crawler traverses sites.

AlbertoPL
+2  A: 

http://phpcrawl.cuab.de/

PSU_Kardi
A: 

you can go for webrat or watir in ruby, much easier than mechanize

fenec