ansaurus

Question

What to use to check html links in large project, on Linux?

Answer 1

+1 A:

You can extract links from html files using Lynx text browser. Bash scripting around this should not be difficult.

mouviciel 2010-03-15 10:14:52

Lynx can do it, but it doesn't really support it. wget is much better suited for the purpose.

reinierpost 2010-03-15 11:18:06

How do you get wget to output a list of links in a page?

David Dorward 2010-03-15 11:27:57

It's a really cool idea. Why didn't I thought of it earlier?

depesz 2010-03-15 13:14:30

Answer 2

+3 A:

I'd use checklink (a W3C project)

David Dorward 2010-03-15 10:26:45

As long as you are careful to set the user agent and accept headers (to avoid bogus error codes from bot detectors) this should work.

Tim Post 2010-03-15 11:41:30

It would look ok, but it's definitely not intended for such large projects - it doesn't have any way to just list broken links, and output for my project is *really* big.

depesz 2010-03-15 13:25:15

Answer 3

A:

Try the webgrep command line tools or, if you're comfortable with Perl, the HTML::TagReader module by the same author.

gareth_bowles 2010-03-15 15:55:09

Answer 4

+1 A:

you can use wget, eg

wget -r --spider  -o output.log http://somedomain.com

at the bottom of the output.log file, it will indicate whether wget has found broken links. you can parse that using awk/grep

ghostdog74 2010-03-15 16:04:02

ansaurus

tags:

views:

answers:

What to use to check html links in large project, on Linux?

related questions