ansaurus

Question

how I can extract images from html code and then validate if are stored in my web server

Answer 1

+3 A:

Use an HTML parser. With PHP Simple HTML DOM Parser, you can do something along the lines of this:

$html = str_get_html($htmlcode);
foreach($html->find('img') as $element) {
    verify_image($element->src);
}

nicholaides 2009-11-22 21:37:04

you could also use a regex, or use simpleXML with XPATH.

prodigitalson 2009-11-22 21:38:34

Regex is not a good way to parse HTML.

Justin Johnson 2009-11-22 21:40:52

And simpleXML won't tolerate invalid html. you would have to run it through html tidy or similar first.

Byron Whitlock 2009-11-22 22:01:39

general rules aren't always true, regex is fine for this

rplevy 2009-11-22 22:27:03

Answer 2

A:

something like this would probably be good:

#!/usr/bin/perl 
open(F, 'tmp.txt');
while(<F>) { 
   while (m/img[^>]* src="([^"]+)"/g) { 
      my $imgurl = $1;
      verify_image($imgurl);
   }
}

rplevy 2009-11-22 22:59:43

While this should work for many cases, it would not verify any image that doesn't have the src immediately after <img - so if there was something like <img id="x" src="x.gif"> it wouldn't be checked.

InsDel 2009-11-22 23:21:18

just a minor edit to address that (see now.)

rplevy 2009-11-22 23:46:12

ansaurus

tags:

views:

answers:

how I can extract images from html code and then validate if are stored in my web server

updated

related questions