Hi, can anyone recommend a website crawler that can show me all of the links in my site?
As long as you are the owner of the site (i.e. you have all the files), Adobe Dreamweaver can generate a report of all your internal & external hyperlinks, and report all broken links (orphan files as well). But, you have to set up your site in Dreamweaver first.
If you need to do any post-processing of the links, I'd recommend any of the many variants of Mechanize.
In Ruby:
require "rubygems"
require "mechanize"
require "addressable/uri"
processed_links = []
unprocessed_links = ["http://example.com/"] # bootstrap list
a = WWW::Mechanize.new
until unprocessed_links.empty?
# This could take awhile, and depending on your site,
# it may be an infinite loop. Adjust accordingly.
processed_links << unprocessed_links.shift
a.get(processed_links.last) do |page|
page.links.each do |link|
link_uri = Addressable::URI.parse(link).normalize
# Ignore external links
unprocessed_links << link_uri.to_str if link_uri.host == "example.com"
end
end
end
Something to that effect.
Larbin ... takes a little C++ coding, but is the perfect performant web crawler foundation and can be used to do basically everything from linkwalking to indexnig to data acquisition.
Xenu is the best link checker tool I have found. It will check all links and then give you an option to view them or export them. It is free, you can download it http://home.snafu.de/tilman/xenulink.html from their site.