views:

21

answers:

1

I'm looking to crawl ~100 webpages that are of the same structure, but the image I require is of a different name in each instance.

The image tag is located at:

#content div.artwork img.artwork

and I need the src url of that result to be downloaded.

Any ideas? I have the urls in a .txt file, and am on a mac os x box.

+1  A: 

I am not sure how you can utilize a 'selector' like query on the file but a Perl regex might do the job just as well:

for url in `cat urls.txt`; do wget -O- $url; done | \
  perl -nle 'print $1 if /<img.+?class="artwork".+?src="([^"]+)"/'
Maxwell Troy Milton King
whats the best way to feed that wget a .txt file of urls?
Peter Clark
Above should work if you are using bash. Not sure about other shells.
Maxwell Troy Milton King