views:

268

answers:

4

I'm working on an application that allows users to add their own RSS feeds to a simple reader of sorts. Currently, I'm using xml_domit_rss as the parser but I'm unsure if it's actually validating the URL before parsing. From what I can gather online, it looks as if validating is separate from the parse, either by using a service (feedvalidator.org) or some other method (parse_url()).

Anyone have some insight into either how XML_domit_rss validates, or a method by which I can validate before sending the URL to the parser?

Thanks in advance...

+1  A: 

You could validate the RSS with a RelaxNG schema. Schemas for all the different feed formats should be available online...

Johannes Weiß
A: 

Validating in the context of XML files (and hence RSS/Atom feeds which use XML to encode the values) means to use a document schema which describes the expected structure of the XML file (which elements can have what child elements, what attributes can be present, etc).

Now some XML parsers require a schema and bork (this is a technical term :-) - refuse to parse) on XML files not conforming to the schema. Now seeing how you are parsing arbitrary RSS, probably it is the best to skip validating and make the best effort of parsing the RSS feed. Also, you could show the parse results to the user (similar to how Google Reader does it when you add a new feed) and let her judge if the result looks ok.

Unfortunately the XML parser used by this code seems to be unfortunately dead and I can't find any detail how strict or lax it is in its parsing...

Cd-MaN
A: 

It's simple, You can do that using SyndicationFeed. It supports Atom 1.0 and RSS 2.0 versions.

try 
{
    SyndicationFeed fetchedItems = SyndicationFeed.Load(XmlReader.Create(feedUrl));
    // Validation successful.
} 
catch { // Validation failed. };
Lukas Šalkauskas
Sounds nice; is that ported to PHP? (having a similar problem, no .net possible on machine)
Piskvor
I don't think so :) You can try to validate it through some web service like feedkiller.com or similar, to do some web request or so..
Lukas Šalkauskas
A: 

This is my quick and dirty solution that worked for me under similar circumstances

foreach($sources as $source) {
    if(!$source["url"]) {
        continue;
    }

    $rss = curl_request($source["url"]);
    $rss = str_replace('&', '&', $rss);

    $parser = xml_parser_create();
    if(xml_parse($parser, $rss)) {
        $xmle = new SimpleXMLElement($rss);
    }
    else {
        $xmle = null;
        continue;
    }

    //other stuff here
}

I make sure to replace the ampersands with &, because not doing that can cause issues with the SimpleXMLElement parser and entities such as • or —

The xml_parse returns 1 on success, so you can check it with a straight if statement. Then using the SimpleXMLElement to traverse the RSS feed makes things nice and easy.

davethegr8