views:

2735

answers:

5

I have been attempting to write some routines to read RSS and ATOM feeds using the new routines available in System.ServiceModel.Syndication, but unfortunately the Rss20FeedFormatter bombs out on about half the feeds I try with the following exception:

An error was encountered when parsing a DateTime value in the XML.

This seems to occur whenever the RSS feed expresses the publish date in the following format:

Thu, 16 Oct 08 14:23:26 -0700

If the feed expresses the publish date as GMT, things go fine:

Thu, 16 Oct 08 21:23:26 GMT

If there's some way to work around this with XMLReaderSettings, I have not found it. Can anyone assist?

+2  A: 

Interesting. It would looks like the datetime formatting is not one of the ones naturally expected by the datetime parser. After looking at the feed classes it does not look like you can inject in your own formatting convention for the parser and they it likely uses a specific scheme for validating the feel.

You may be able to change how the datetime parser behaves by modifying the culture. I have never done it before so I can't say for sure it would work.

Another solution night be to first transform the feed you are trying to read. Likely not the greatest but it could get you around the issue.

Good luck.

smaclell
+6  A: 

RSS 2.0 formatted syndication feeds utilize the RFC 822 date-time specification when serializing elements like pubDate and lastBuildDate. The RFC 822 date-time specification is unfortunately a very 'flexible' syntax for expressing the time-zone component of a DateTime.

Time zone may be indicated in several ways. "UT" is Universal Time (formerly called "Greenwich Mean Time"); "GMT" is permitted as a reference to Universal Time. The military standard uses a single character for each zone. "Z" is Universal Time. "A" indicates one hour earlier, and "M" indicates 12 hours earlier; "N" is one hour later, and "Y" is 12 hours later. The letter "J" is not used. The other remaining two forms are taken from ANSI standard X3.51-1975. One allows explicit indication of the amount of offset from UT; the other uses common 3-character strings for indicating time zones in North America.

I believe the issue involves how the zone component of the RFC 822 date-time value is being processed. The feed formatter appears to not be handling date-times that utilize a local differential to indicate the time zone.

As RFC 1123 extends the RFC 822 specification, you could try using the DateTimeFormatInfo.RFC1123Pattern ("r") to handle converting problamatic date-times, or write your own parsing code for RFC 822 formatted dates. Another option would be to use a third party framework instead of the System.ServiceModel.Syndication namespace classes.

It appears there are some known issues with date-time parsing and the Rss20FeedFormatter that are in the process of being addressed by Microsoft.

Oppositional
Thanks - it appears this was brought to Microsoft's attention was back in February but it's not fixed yet. :(
dan90266
+3  A: 

As a workaround, I have been using Yahoo Pipes (pipes.yahoo.com) to convert the feed. I then consume the newly created feed.

rjester
+3  A: 

Based on the workaround posted in the bug report to Microsoft about this I made an XmlReader specifically for reading SyndicationFeeds that have non-standard dates.

The code below is slightly different than the code in the workaround at Microsoft's site. It also takes Oppositional's advice on using the RFC 1123 pattern.

Instead of simply calling XmlReader.Create() you need to create the XmlReader from a Stream. I use the WebClient class to get that stream:

WebClient client = new WebClient();
using (XmlReader reader = new SyndicationFeedXmlReader(client.OpenRead(feedUrl)))
{
    SyndicationFeed feed = SyndicationFeed.Load(reader);
    ....
    //do things with the feed
    ....
}

Below is the code for the SyndicationFeedXmlReader:

public class SyndicationFeedXmlReader : XmlTextReader
{
    readonly string[] Rss20DateTimeHints = { "pubDate" };
    readonly string[] Atom10DateTimeHints = { "updated", "published", "lastBuildDate" };
    private bool isRss2DateTime = false;
    private bool isAtomDateTime = false;

    public SyndicationFeedXmlReader(Stream stream) : base(stream) { }

    public override bool IsStartElement(string localname, string ns)
    {
        isRss2DateTime = false;
        isAtomDateTime = false;

        if (Rss20DateTimeHints.Contains(localname)) isRss2DateTime = true;
        if (Atom10DateTimeHints.Contains(localname)) isAtomDateTime = true;

        return base.IsStartElement(localname, ns);
    }

    public override string ReadString()
    {
        string dateVal = base.ReadString();

        try
        {
            if (isRss2DateTime)
            {
                MethodInfo objMethod = typeof(Rss20FeedFormatter).GetMethod("DateFromString", BindingFlags.NonPublic | BindingFlags.Static);
                Debug.Assert(objMethod != null);
                objMethod.Invoke(null, new object[] { dateVal, this });

            }
            if (isAtomDateTime)
            {
                MethodInfo objMethod = typeof(Atom10FeedFormatter).GetMethod("DateFromString", BindingFlags.NonPublic | BindingFlags.Instance);
                Debug.Assert(objMethod != null);
                objMethod.Invoke(new Atom10FeedFormatter(), new object[] { dateVal, this });
            }
        }
        catch (TargetInvocationException)
        {
            DateTimeFormatInfo dtfi = CultureInfo.CurrentCulture.DateTimeFormat;
            return DateTimeOffset.UtcNow.ToString(dtfi.RFC1123Pattern);
        }

        return dateVal;

    }

}

Again, this is copied almost exactly from the workaround posted on the Microsoft site in the link above. ...except that this one works for me, and the one posted at Microsoft did not.

NOTE: One bit of customization you may need to do is in the two arrays at the start of the class. Depending on any extraneous fields your non-standard feed might add, you may need to add more items to those arrays.

Clever Human
It appears that you give up being able to use XmlReaderSettings with this method, namely the DtdProcessing option. A problem for those feeds still referencing the rss-0.91.dtd.
Ant
A: 

A similar problem still persists in .NET 4.0 and I decided to work with XDocument instead of directly invoking SyndicationFeed. I described the applied method (specific to my project here). Can't say it is the best solution, but it certainly can be considered a "backup plan" in case SyndicationFeed fails.

Dennis Delimarsky