views:

143

answers:

3

I am trying to extract all img tags from an HTML string. See the code

$d1     = file_get_contents("http://itcapsule.blogspot.com/feeds/posts/default?alt=rss");
preg_match_all('/<img[^>]+>/i',$d1,$result);
print_r($result);

And the result is

Array ( [0] => Array ( ) )

But the same regex gives correct result in an online regex test tool http://regex.larsolavtorvik.com/.

What could be the problem ?

+2  A: 

Do not use regular expressions to process html, use a parser instead.

soulmerge
Thanks, Let me try that
Orion
+1  A: 

The content you are parsing is encoded with html entities - basically < is replaced with &lt;. Use html_entity_decode first to convert the data into normal html.

PS: Use an HTML parser instead of regex.

Otto Allmendinger
Thanks !! will try that
Orion
A: 

Solved the problem by using SimplePie XML Parser

include_once 'simplepie.inc';

$feed   = "feedurl";

$data       =   new SimplePie($feed);
$data->init();
$data->handle_content_type();

foreach ($data->get_items() as $item)
{
    $desc=$item->get_description();
    preg_match_all('/<img[^>]+>/i',$desc,$result);
    print_r($result);
}

This is what exactly i was looking for :) Thanks guys !!

Orion