views:

517

answers:

9

I often have to work with fragile legacy websites that break in unexpected ways when logic or configuration are updated.

I don't have the time or knowledge of the system needed to create a Selenium script. Besides, I don't want to check a specific use case - I want to verify every link and page on the site.

I would like to create an automated system test that will spider through a site and check for broken links and crashes. Ideally, there would be a tool that I could use to achieve this. It should have as many as possible of the following features, in descending order of priority:

  • Triggered via script
  • Does not require human interaction
  • Follows all links including anchor tags and links to CSS and js files
  • Produces a log of all found 404s, 500s etc.
  • Can be deployed locally to check sites on intranets
  • Supports cookie/form-based authentication
  • Free/Open source

There are many partial solutions out there, like FitNesse, Firefox's LinkChecker and the W3C link checker, but none of them do everything I need.

I would like to use this test with projects using a range of technologies and platforms, so the more portable the solution the better.

I realise this is no substitute for proper system testing, but it would be very useful if I had a convenient and automatable way of verifying that no part of the site was obviously broken.

A: 

Have you looked at WaTiR? It's Web Applications Testing in Ruby (WatiJ and others also exist). A nice quick-to-build scripting solution, and very few limitations - the only frustation is that it can't access data inside Java Appliest - but as far as I'm aware only QTP does that successfully.

http://watir.com/

Mark Mayo
Thanks for your suggestion, but I was hoping to find a tool that would spider through the site checking each link it came across. I am after a lightweight test that will check for broken links and crashes, but that I don't have to program specific actions into.
ctford
Watir is equivalent to Selenium, so doesn't help the poster
orip
A: 

InSite is a commercial program that seems to do what you want (haven't used it).

If I was in your shoes, I'd probably write this sort of spider myself...

orip
Writing it myself might be an option, but I'm surprised that there doesn't seem to be such a tool out there already. I would have thought it was a common need.
ctford
I agree, I was surprised when I looked for one after your question. I thought, "I could use something like this", but no cigar.
orip
A: 

What part of your list does the W3C link checker not meet? That would be the one I would use.

Alternatively, twill (python-based) is an interesting little language for this kind of thing. It has a link checker module but I don't think it works recursively, so that's not so good for spidering. But you could modify it if you're comfortable with that. And I could be wrong, there might be a recursive option. Worth checking out, anyway.

Zac Thompson
From a preliminary look the W3C Link Checker does the checking meaning it fails:"Can be deployed locally to check sites on intranets"
Adam
@Adam: Not at all - there is a download link right at the bottom of the page linked to in the question! http://search.cpan.org/dist/W3C-LinkChecker/
Zac Thompson
+3  A: 

I use Xenu's Link Sleuth for this sort of thing. Quickly check for no deadlinks etc. on a/any site. Just point it at any URI and it'll spider all links on that site.

Desription from site:

Xenu's Link Sleuth (TM) checks Web sites for broken links. Link verification is done on "normal" links, images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts and java applets. It displays a continously updated list of URLs which you can sort by different criteria. A report can be produced at any time.

It meets all you're requirements apart from being scriptable as it's a windows app that requires manually starting.

Matt Lacey
I have used this program and it works really well!
meme
it's also not open source.
Zac Thompson
It's not open source, but it is free (it includes some advertising links in reports which I;ve always happily ignored.)
Matt Lacey
Xenu's Link Sleuth's website says that operating the program for the command line is available for "a $300 donation" donated to a cause Tilman supports.
ctford
+1  A: 

You might want to try using wget for this. It can spider a site including the "page requisites" (i.e. files) and can be configured to log errors. I don't know if it will have enough information for you but it's Free and available on Windows (cygwin) as well as unix.

Mr. Shiny and New
+1  A: 

I'm not sure that it supports form authentication but it will handle cookies if you can get it going on the site and otherwise I think Checkbot will do everything on your list. I've used as a step in build process before to check that nothing broken on a site. There's an example output on the website.

Ian G
+4  A: 

We use and really like Linkchecker:

http://linkchecker.sourceforge.net/

It's open-source, Python, command-line, internally deployable, and outputs to a variety of formats. The developer has been very helpful when we've contacted him with issues.

We have a Ruby script that queries our database of internal websites, kicks off LinkChecker with appropriate parameters for each site, and parses the XML that LinkChecker gives us to create a custom error report for each site in our CMS.

Sean McMains
This looks promising.
artlung
+1  A: 

I have always liked linklint for checking links on a site. However, I don't think it meets all your criteria, particularly the aspects that may be JavaScript dependent. I also think it will miss the images called from inside CSS.

But for spidering all anchors, it works great.

artlung
A: 

Try SortSite. It's not free, but seems to do everything you need and more.

Alternatively, PowerMapper from the same company has a similar-but-different approach. The latter will give you less information about detailed optimisation of your pages, but will still identify any broken links, etc.

Disclaimer: I have a financial interest in the company that makes these products.

Gary McGill