tags:

views:

196

answers:

4

I've inherited an old Classic ASP website to modify. Although not requested up-front, I'd like to delete a bunch of the old "orphaned" pages.

For some reason, The old developer decided to create muliple instances of the file instead of using source control (eg. index-t.asp, index-feb09.asp, index-menutest.asp).

I'm wondering if anyone knows of a program or website, that can crawl my own site for me? It probably needs to be able to crawl public site, since there are lots of include files. Also, some of the urls are relative and some are absolute.

+1  A: 

My favorite tool is Xenu.

JonnyBoats
Do you konw if this software has a recursive feature? or a limit?
bendewey
If it has a limit I have not hit it. I have used this on sites with over 10,000 pages. Also note that unlike W3C's tool (which is fine as far as it goes), this tool has the ability to detect orphan pages if you allow it FTP access to your site.Finally unlike some of the other techniques suggested, Xenu makes real requests to the site so it works just fine with dynamically generated webpages.Here is the Winipedia page: http://en.wikipedia.org/wiki/Xenu%27s_Link_Sleuth.
JonnyBoats
Thanks, this will work great for me.
bendewey
+1  A: 

There's also the W3C link checker: http://validator.w3.org/checklink

David Weitz
This has a limit of 150 pages when i crawl recursively
bendewey
A: 

You should never let a once-valid URL go stale. Bad web developer! No biscuit!!

Norman Ramsey
A: 

You should consider:

  1. Putting the entire existing site into source control, then
  2. Delete the extra pages and see who complains
John Saunders
It's already been added to source control, that was my first task. Now I'm trying to delete the extra pages, but, I want to make sure I don't delete pages that are needed.
bendewey