views:

215

answers:

5

Hi, can anyone recommend a website crawler that can show me all of the links in my site?

A: 

As long as you are the owner of the site (i.e. you have all the files), Adobe Dreamweaver can generate a report of all your internal & external hyperlinks, and report all broken links (orphan files as well). But, you have to set up your site in Dreamweaver first.

Shivan Raptor
+3  A: 

W3C has the best one I've found

http://validator.w3.org/checklink

Chris Ballance
A: 

If you need to do any post-processing of the links, I'd recommend any of the many variants of Mechanize.

In Ruby:

require "rubygems"
require "mechanize"
require "addressable/uri"

processed_links = []
unprocessed_links = ["http://example.com/"] # bootstrap list
a = WWW::Mechanize.new
until unprocessed_links.empty?
  # This could take awhile, and depending on your site,
  # it may be an infinite loop.  Adjust accordingly.
  processed_links << unprocessed_links.shift
  a.get(processed_links.last) do |page|
    page.links.each do |link|
      link_uri = Addressable::URI.parse(link).normalize
      # Ignore external links
      unprocessed_links << link_uri.to_str if link_uri.host == "example.com"
    end
  end
end

Something to that effect.

Bob Aman
A: 

Larbin ... takes a little C++ coding, but is the perfect performant web crawler foundation and can be used to do basically everything from linkwalking to indexnig to data acquisition.

Martin Hohenberg
+1  A: 

Xenu is the best link checker tool I have found. It will check all links and then give you an option to view them or export them. It is free, you can download it http://home.snafu.de/tilman/xenulink.html from their site.

meme