views:

218

answers:

5

I have a requirement to check that all the hyperlinks still work on a password protected, private website. What's the best way of doing this?

The site is mix of HTML and ASP.NET Webforms.

EDIT: Sorry - I don't think this question was clear.

I need something like this:

http://validator.w3.org/checklink

But for a site hidden behind a user/pass form. I don't mind doing something programmatically or purchasing something if it's reasonable.

A: 

You just need to authenticate the WebRequests ...

Where are you stuck?

-- Edit

Well, it depends on what you mean by 'password protected'. How do is the login scheme implemented?

Noon Silk
It's just form authentication. But most/all the software services I've seen can't seem to do private sites. If you know of a reputable one, please post it. There seems to be a huge amount of crapware out there.
IainMH
Check what mechanism the private site uses. If it's cookies, stuff the right cookie in your requests when you pull the data.
Wouter Lievens
Iain: Are you looking to write something or buy/download something?
Noon Silk
@silky - looking to buy/download - something like this, but for a private site behind a user/pass http://validator.w3.org/checklink
IainMH
Ah, then I think this may not be the correct website to help you. Either way, *I* certainly can't help you :)
Noon Silk
I could write something if nothing is available. But surely this has been done loads of times.
IainMH
+2  A: 

You should seriously look at the unix command line tools to do this. Esp wget

take a look at the --spider option in combination with the --user and --password options...

Also take a look at curl or libcurl+php

There are two things that are not terrible clear about your question.

First, what sort of user/password are required. These can be POST values or they can be the username and password from the http protocol. Which do you want? There are several ways to provide a username and password to a website, and whatever solution you use has to work with your website. That means that you have to have a very accurate understanding of which method you are using. Just the fact that it has a username and password is not nearly enough information.

Second it is unclear what you mean by "links still work" do you mean internal links that will or will not work based on the proper functioning of your application, or do you mean links to public Internet sites that happen to be on a password protected site?

I am assuming the later with this answer. But if you meant the former then you should look into one of several web application test suites that have recently come available.

HTH, FT

ftrotter
+1  A: 

Rel Software's Web Link Validator works quite happily with Forms Auth based sites - we've been using it on client sites for some time now.

The main things to watch out for are:

  1. Send the link checker to your Login Page first.
  2. Ensure you tell it to ignore all Logout URLs (so it doesn't inadvertently log itself out).
Zhaph - Ben Duguid
Thanks Zhaph I've just been trying this. It's pretty good.
IainMH
+1  A: 

I enjoy using SimpleTest for testing my own websites, but there's no built-in link checker.

You could use it to navigate the login and fetch the page body. You'd then parse the content using regular expressions to find all links, and use SimpleTest again to verify those links (and even crawl them to verify recursively).

Of course, using cURL (or libcurl with your language of choice) gets you pretty close, too.

grossvogel
A: 

You can do this using Apache httpclient has the features

Madhu