views:

221

answers:

2

I am mirroring a website starting my crawl from a particular subdomain (eg a.foo.com).
How can I make wget also download content from other linked subdomains (eg b.foo.com) but not external domains (eg google.com)?

I assumed this would work:

wget --mirror --domains="foo.com" a.foo.com

However links to b.foo.com were not followed.

+2  A: 

you'd need to add -H as well (enable spanning of hosts when doing recursive downloads) -D "foo.com,b.foo.com" -H

jspcal
ah - now I get what "Note that it does not turn on -H." means.
Plumo
A: 

why not try httrack? it has the option to define how to deal with external link.

Dyno Fu