ansaurus

Question

wget WIKI, don't get diff pages (exclude by regex?)

Answer 1

+2 A:

-R '*action=diff*,*action=edit*'

chaos 2009-06-01 17:55:10

It looks like doing that will download the page, reject it, and then delete it (instead of skipping to download it altogether).

stonea 2009-06-01 18:24:30

Although it will prevent recursing on the rejected page.

stonea 2009-06-01 18:26:04

I see no evidence of that. "The ‘--reject’ option works the same way as ‘--accept’, only its logic is the reverse; Wget will download all files except the ones matching the suffixes (or patterns) in the list". (-R is the same as --reject and --rejlist.) That seems to be clearly stating it will not download matching patterns.

chaos 2009-06-01 18:44:58

Seems like a bug in wget. Other people have had this issue before: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=217243

stonea 2009-06-01 19:57:59

Hunh. Well, that's friggin' goofy. Sorry, guess you can't quite do all of it with wget then. :(

chaos 2009-06-01 20:49:26

If you're using Mediawiki, you could try using the API instead http://www.mediawiki.org/wiki/API

Adrian Archer 2009-06-16 13:32:01

ansaurus

tags:

views:

answers:

wget WIKI, don't get diff pages (exclude by regex?)

related questions