views:

28

answers:

2

When I load a page with Firebug I can see a list of all the images required by the site. How can I automate finding a list of the image URLs used by a webpage, including those referenced in external CSS?

A: 

With PHP Simple HTML DOM Parser it is as easy as:

$html = file_get_html('http://www.google.com/');
$ret = $html->find('img');

Simple HTML DOM parser also includes options to get attributes of each object, so you should be able to grab the URL easily. Something like:

$URL = $ret->src;

(This looks through the DOM, so I assume it will find images inserted by CSS, but I have not had a chance to test it.)

Oren
extracting img tags from HTML is easy. Even a regex will do. The hard bit is finding images referenced from external CSS or dynamically loaded with JavaScript, which that tool will not do.
Plumo
A: 

There are a few Firefox extensions that deal with downloading images from a web page. How about trying the "Image Download" add-on?

URLParser.com
I need to automate this for 100K+ websites ...
Plumo