views:

5720

answers:

7

I'm currently using Magpie RSS but it sometimes falls over when the RSS or Atom feed isn't well formed. Are there any other options for parsing RSS and Atom feeds with PHP?

+9  A: 

Your other options include:

Philip Morton
wrong wrong wrong wrong! http://coreylib.com. Now.
Kenneth Reitz
Zend Feed http://framework.zend.com/manual/en/zend.feed.html
artur
+1  A: 

I use SimplePie to parse a Google Reader feed and it works pretty well and has a decent feature set.

Of course, I haven't tested it with non-well-formed RSS / Atom feeds so I don't know how it copes with those, I'm assuming Google's are fairly standards compliant! :)

Phill Sacre
+2  A: 

The HTML Tidy library is able to fix some malformed XML files. Running your feeds through that before passing them on to the parser may help.

ceejayoz
+5  A: 

I've always used the SimpleXML functions built in to PHP to parse XML documents. It's one of the few generic parsers out there that has an intuitive structure to it, which makes it extremely easy to build a meaningful class for something specific like an RSS feed. Additionally, it will detect XML warnings and errors, and upon finding any you could simply run the source through something like HTML Tidy (as ceejayoz mentioned) to clean it up and attempt it again.

Consider this very rough, simple class using SimpleXML:

<?php

class BlogPost
{
    var $date;
    var $ts;
    var $link;

    var $title;
    var $text;
}

class BlogFeed
{
    var $posts = array();

    function BlogFeed($file_or_url)
    {
        if(!eregi('^http:', $file_or_url))
            $feed_uri = $_SERVER['DOCUMENT_ROOT'] .'/shared/xml/'. $feed_or_url;
        else
            $feed_uri = $feed_or_url;

        $xml_source = file_get_contents($feed_uri);
        $x = simplexml_load_string($xml_source);

        if(count($x) == 0)
            return;

        foreach($x->channel->item as $item)
        {
            $post = new BlogPost();
            $post->date = (string) $item->pubDate;
            $post->ts = strtotime($item->pubDate);
            $post->link = (string) $item->link;
            $post->title = (string) $item->title;
            $post->text = (string) $item->description;

            // Create summary as a shortened body and remove images, extraneous line breaks, etc.
            $summary = $post->text;
            $summary = eregi_replace("<img[^>]*>", "", $summary);
            $summary = eregi_replace("^(<br[ ]?/>)*", "", $summary);
            $summary = eregi_replace("(<br[ ]?/>)*$", "", $summary);

            // Truncate summary line to 100 characters
            $max_len = 100;
            if(strlen($summary) > $max_len)
                $summary = substr($summary, 0, $max_len) . '...';

            $post->summary = $summary;

            $this->posts[] = $post;
        }
    }
}

?>
Brian Cline
you have an end-tag with no start tag. ;)
Talvi Watia
Well, I had one, but it was being eaten by SO's code formatter since it had no empty line above it. On a related note, you did not start your sentence with a capital letter. ;)
Brian Cline
+4  A: 

If feed isn't well-formed XML, you're supposed to reject it, no exceptions. You're entitled to call feed creator a bozo.

Otherwise you're paving way to mess that HTML ended up in.

porneL
+1, you should not try to work around any XML that is not well-formed. We've had bad experiences with them, trust me, it was big pain :(
Helen Neely
A: 

Personally I use BNC Advanced Feed Parser- i like the template system that is very easy to use

Adam