views:

27

answers:

2

I have taken on an ASP.NET web site where the client is using the web server as a code repository, i.e. removing a page from the site involves not linking to it any more. There are a stupendous number of unsused files, and I would like to archive these off and arrive at a lean git repository of only files used by the active site.

How can I get usage or coverage data that will tell me, over an agreed upon period, i.e. a month, which pages are being hit? I know there are many ways of doing this in ASP.NET, and even in plain IIS, but I'd like some suggestions on a convenient and simple way of doing this.

+1  A: 

I would suggest the IIS logs, but that wouldn't report linked pages that haven't been accessed by users.

You could try running a spider on the site. Here's a free tool. http://www.trellian.com/sitespider/download.htm

You should be careful what which files you delete from the web server if there are cached links to the pages out there. A good strategy would be to use Google. Run the following search query to see what pages are returned "site:example.com" where example.com is the domain for your site.

Babak Naffas
I'm not worried about cached files, as this is effectively an intranet application.
ProfK
Then this basic spider is your best bet.
Babak Naffas
A: 

look at the access logs for the agreed period and compare the list of pages visited against the full list of all pages. this seems like more work than necessary though.

there is a program called Xenu link checker which already contains the functionality you require. it can spider your site and if you tell it where the files are it will identify unused files for you.

Matt Lacey