views:

461

answers:

5

Possible Duplicate:
How to extract img src, title and alt from html using php?

Hi,
I have found solution to get first image from string:

preg_match('~<img[^>]*src\s?=\s?[\'"]([^\'"]*)~i',$string, $matches);

But I can't manage to get all images from string.
One more thing... If image contains alternative text (alt attribute) how to get it too and save to another variable?
Thanks in advance,
Ilija

A: 

You assume that you can parse HTML using regular expressions. That may work for some sites, but not all sites. Since you are limiting yourself to only a subset of all web pages, it would be interesting to know how you limit yourself... maybe you can parse the HTML in a quite easy way from php.

Lars D
A: 

Look at preg_match_all to get all matches.

Per Östlund
+9  A: 

Don't do this with regular expressions. Instead, parse the HTML. Take a look at Parse HTML With PHP And DOM. This is a standard feature in PHP 5.2.x (and probably earlier). Basically the logic for getting images is roughly:

$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
  echo $image->getAttribute('src');
}

This should be trivial to adapt to finding images.

cletus
Yes, this seems like the best solution.
Per Östlund
Hi cletus,This looks very simple and much better solution than regular expressions. Still, I didn't manage to make it work with images. Is there any other link with some better explanation?Thank you a lot!
ile
+2  A: 

Note that Regular Expressions are a bad approach to parsing anything that involves matching braces.

You'd be better off using the DOMDocument class.

therefromhere
Heh, Cletus beat me to it of course :)
therefromhere
A: 

This is what I tried but can't get it print value of src

 $dom = new domDocument;

    /*** load the html into the object ***/
    $dom->loadHTML($html);

    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;

    /*** the table by its tag name ***/
    $images = $dom->getElementsByTagName('img');

    /*** loop over the table rows ***/
    foreach ($images as $img)
    {
        /*** get each column by tag name ***/
        $url = $img->getElementsByTagName('src');
        /*** echo the values ***/
        echo $url->nodeValue;
     echo '<hr />';
    }

EDIT: I solved this problem

$dom = new domDocument;

/*** load the html into the object ***/
$dom->loadHTML($string);

/*** discard white space ***/
$dom->preserveWhiteSpace = false;

$images = $dom->getElementsByTagName('img');

foreach($images as $img)
    {
     $url = $img->getAttribute('src'); 
     $alt = $img->getAttribute('alt'); 
     echo "Title: $alt<br>$url<br>";
    }
ile