views:

99

answers:

1

I'm trying to extract the first src attribute of an image in a block of HTML text like this:

Lorem ipsum <img src="http://site.com/img.jpg" />consequat.

I have no problem creating the regex to match the src attribute, but how do I return the first matched src attribute, instead of replacing it?

From pouring over the PHP manual, it seems like preg_filter() would do the trick, but I can't rely on end users having PHP > 5.3.

All the other PHP regex functions seem to be variations of preg_match(), returning a boolean value, or preg_replace, which replaces the match with something. Is there a straightforward way to return a regex match in PHP?

+2  A: 

You can use the third parameter of preg_match, to know what was matches (It's an array, passed by reference) :

int preg_match  ( string $pattern  , 
    string $subject  [, array &$matches  [, 
    int $flags  [, int $offset  ]]] )

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.


For instance, with this portion of code :

$str = 'Lorem ipsum dolor sit amet, adipisicing <img src="http://site.com/img.jpg" />consequat.';

$matches = array();
if (preg_match('#<img src="(.*?)" />#', $str, $matches)) {
    var_dump($matches);
}

You'll get this output :

array
  0 => string '<img src="http://site.com/img.jpg" />' (length=37)
  1 => string 'http://site.com/img.jpg' (length=23)

(Note that my regex is overly simplistic -- and that regex are generally not "the right tool" when it comes to extracting data from some HTML string... )

Pascal MARTIN
excellent, thanks. btw, what is "the right tool" to extract data from an HTML string?
Jared Henderson
you're welcome :-) ;; that's a tricky question ^^ I you have a full HTML document, I kinda like the idea of using DOMDocument::loadHTML (see http://stackoverflow.com/questions/1274020/extract-form-fields-using-regex/1274074#1274074 for some thoughts I posted some time ago) -- but there are also other solutions
Pascal MARTIN