I would like to get all URLs a site links to (on the same domain) without downloading all of the content with something like wget. Is there a way to tell wget to just list the links it WOULD download?
For a little background of what I'm using this for if someone can come up with a better solution: I'm trying to build a robots.txt file that excludes all files that end with p[4-9].html but robots.txt doesn't support regular expressions. So I'm trying to get all links and then run a regular expression against them then put the result in the robots.txt. Any ideas?