views:

1100

answers:

4

I'm attempting to run preg_match to extract the SRC attribute from the first IMG tag in an article (in this case, stored in $row->introtext).

preg_match('/\< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\']*)/i', $row->introtext, $matches);

Instead of getting something like

images/stories/otakuzoku1.jpg

from

<img src="images/stories/otakuzoku1.jpg" border="0" alt="Inside Otakuzoku's store" />

I get just

0

The regex should be right, but I can't tell why it appears to be matching the border attribute and not the src attribute.

Alternatively, if you've had the patience to read this far without skipping straight to the reply field and typing 'use a HTML/XML parser', can a good tutorial for one be recommended as I'm having trouble finding one at all that's applicable to PHP 4.

PHP 4.4.7

+1  A: 

Your expression is incorrect. Try:

preg_match('/< *img[^>]*src *= *["\']?([^"\']*)/i', $row->introtext, $matches);

Note the removal of brackets around img and src and some other cleanups.

CalebD
This did the trick. Not the 'ideal' solution of actually parsing the HTML, but the one solution that works and gives the neccessary result. Thanks!
KyokoHunter
+1  A: 

Try:

include ("htmlparser.inc"); // from: http://php-html.sourceforge.net/

$html = 'bla <img src="images/stories/otakuzoku1.jpg" border="0" alt="Inside Otakuzoku\'s store" /> noise <img src="das" /> foo';

$parser = new HtmlParser($html);

while($parser->parse()) {
    if($parser->iNodeName == 'img') {
        echo $parser->iNodeAttributes['src'];
        break;
    }
}

which will produce:

images/stories/otakuzoku1.jpg

It should work with PHP 4.x.

Bart Kiers
+1, nice one, I was just wording something to that effect using that old DOM parser :)
karim79
Looks useful - will give it a try and report back here.
KyokoHunter
Some problems getting htmlparser.inc to work. Error message says the class is already initiated, but it isn't. I'll hold out for a provider upgrade to PHP 5...
KyokoHunter
Have you tried `include_once('htmlparser.inc');` instead of `include('htmlparser.inc');`?
Bart Kiers
A: 

Here's a way to do it with built-in functions (php >= 4):

$parser = xml_parser_create();
xml_parse_into_struct($parser, $html, $values);
foreach ($values as $key => $val) {
    if ($val['tag'] == 'IMG') {
        $first_src = $val['attributes']['SRC'];
        break;
    }
}

echo $first_src;  // images/stories/otakuzoku1.jpg
GZipp
A: 

The regex I used was much simpler. My code assumes that the string being passed to it contains exactly one img tag with no other markup:

$pattern = '/src="([^"]*)"/';

See my answer here for more info: http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php/3815188#3815188

Jazzerus