views:

55

answers:

0

I'm trying to find any dead links on a website using wget. I'm running:

wget -r -l20 -erobots=off --spider -S http://www.example.com

which recursively checks to make sure each link on the page exists and retrieves the headers. I am then parsing the output with a simple script.

I would like to know which page wget retrieved a given link from however the only information given by wget are the page it's requesting, the header and a time stamp (and some other stuff I don't care about). That information is enough to know that a dead link exists but it doesn't let me know what page the dead link is located on.

Is there anyway to set wget to output that information (short of having it actually download the entire site)?