views:

50

answers:

2

I am working with a client to migrate a web site from the existing production hardware into a new hardware environment. Now seems like an excellent time to perform an audit and remove any old or obsolete content rather than just blindly copy it again.

Are there any good free tools or scripts I can use to compare the web accessible content on a server to the actual files on a server to see what content is actually being linked to and used?

Thanks in advance for any help!

A: 

I'm sure there is but I'm sure there isn't one that could do a better job than you could yourself, ya know? How big is this site and did you code it yourself?

John
The site is very large, somewhere in the range of 2-3000 pages, plus the referenced images and files. Its not practical to do it by hand. I could write a script to parse every page, extract the links and check them until completion, recording every found page. Then compare against the file system but that would take a lot of time. I'm not the first person to have to do this so I'm thinking there must be some free or open source tool that could help, I just don't know of one.
Brian Teeter
+1  A: 

Well, for starters you can use a tool like Xenu's Link Sleuth to spider all of your pages to find broken links and the like. We used this tool on our intranet to find and fix our broken links. It's free and gets the job done.

Another tool that we have used for migrations between systems is a search engine. A good search engine will spider all of your pages and show the two-way relationship between links. This can help you find what content is being linked to the most and what is possibly orphaned. Unfortunately, these kinds of tools are not free.

Zack Mulgrew