views:

41

answers:

4

Hi,

so I am transferring an old website to a new server, and attempting cleanup in the process.

What I am looking for is some script or free software that can:

a) show the paths through the website (following hyperlinks, etc), so I can see what links to what

and b) some software than can see which html files are orphans (not linked to) in the folder structure.

Any help with either or both of these would be greatly appreciated :)

A: 

http://haveamint.com/ says it all, Beautiful GUI, Simple integration, Light Weight, Database Storage, JavaScript Tracking.

Have a mint (y)

Or you can just use Google analytic's witch is pretty much used by every site these days

RobertPitt
I should probably have added the caveat "for free"
simonalexander2005
then just go with google analytics with custom link tracking.
RobertPitt
+1  A: 

a) show the paths through the website (following hyperlinks, etc), so I can see what links to what

So basically a crawler? You could whisk something together with an http-library, an html parser and any brand of scripting language. I don't know of any off-the-shelf scripts though.

and b) some software than can see which html files are orphans (not linked to) in the folder structure.

Does your site consist of plain html files, or is there some sort of server-side technology, such as PHP? If so, there is no way of automatically detecting said orphans, since they are generated as a function of the server side application and aren't actual pages, even though they may appear as such in a browser.

troelskn
no they're just HTTP pages - it's only a small thing :)
simonalexander2005
Did you mean HTML pages?
troelskn
Sorry, yes - although I have realised recently there are some links contained in javascript too...
simonalexander2005
A: 

a) depending on the complexity of your site and how dynamic the content is you can download any spider and restrict it to your wevsite and check the results("burp suite" contains a pretty good spider and is alltogether a tool that everyone should know).

b) after the spider have done its work check the access time of all the files in your wevsites directory any file that has an access time older than the spider execution time is probably an orphan.

(both solutions will be less effective on a website that use user input to reffer to pages)

dmig
A: 

home.snafu.de/tilman/xenulink.html (Xenulink) provides link spidering, and, with FTP access, orphan file checking.

simonalexander2005