tags:

views:

485

answers:

2

I'm writing an RSS to JSON parser and as a part of that, I need to use htmlentities() on any tag found inside the description tag. Currently, I'm trying to use preg_replace(), but I'm struggling a little with it. My current (non-working) code looks like:

$pattern[0] = "/\<description\>(.*?)\<\/description\>/is";
$replace[0] = '<description>'.htmlentities("$1").'</description>';
$rawFeed = preg_replace($pattern, $replace, $rawFeed);

If you have a more elegant solution to this as well, please share. Thanks.

+3  A: 

Simple. Use preg_replace_callback:

function _handle_match($match)
{
    return '<description>' . htmlentities($match[1]) . '</description>';
}

$pattern = "/\<description\>(.*?)\<\/description\>/is";
$rawFeed = preg_replace_callback($pattern, '_handle_match', $rawFeed);

It accepts any callback type, so also methods in classes.

Armin Ronacher
That did the trick, thank you.
VirtuosiMedia
A: 

The more elegant solution would be to employ SimpleXML. Or a third party library such as XML_Feed_Parser or Zend_Feed to parse the feed.

Here is a SimpleXML example:

<?php
$rss = file_get_contents('http://rss.slashdot.org/Slashdot/slashdot');
$xml = simplexml_load_string($rss);

foreach ($xml->item as $item) {
    echo "{$item->description}\n\n";
}
?>

Keep in mind that RSS and RDF and Atom look different, which is why it can make sense to employ one of the above libraries I mentioned.

Till
I am actually using simpleXML, but the problem is that any embedded HTML inside the description tag also becomes an object, which is why I am entity encoding it first.
VirtuosiMedia
Your feed is broken then. Good feeds wrap HTML and similar in CDATA.
Till
When I said "good", I meant "valid". :)
Till