views:

393

answers:

3

I'm trying to parse Rss2, Atom feeds using SyndicationFeedFormatter and SyndicationFeed objects. But I'm getting XmlExceptions while parsing DateTime field like pubDate and/or lastBuildDate.

Wed, 24 Feb 2010 18:56:04 GMT+00:00 does not work

Wed, 24 Feb 2010 18:56:04 GMT works

So, it's throwing due to the timezone field.

As a workaround, for familiar feeds I would manually fix those DateTime nodes - by catching the XmlException, loading the Rss into an XmlDocument, fixing those nodes' value, creating a new XmlReader and then returning the formatter from this new XmlReader object (code not shown). But for this approach to work, I need to know beforehand which nodes cause exception.

        SyndicationFeedFormatter syndicationFeedFormatter = null;
        XmlReaderSettings settings = new XmlReaderSettings();
        using (XmlReader reader = XmlReader.Create(url, settings))
        {
            try
            {
                syndicationFeedFormatter = SyndicationFormatterFactory.CreateFeedFormatter(reader);
                syndicationFeedFormatter.ReadFrom(reader);
            }
            catch (XmlException xexp)
            {
                // fix those datetime nodes with exceptions and read again.
            }
        return syndicationFeedFormatter;
    }

rss feed: http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&q=test&cf=all&output=rss

exception detials:

XmlException Error in line 1 position 376. An error was encountered when parsing a DateTime value in the XML.
at System.ServiceModel.Syndication.Rss20FeedFormatter.DateFromString(String dateTimeString, XmlReader reader)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml(XmlReader reader, SyndicationFeed result) at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom(XmlReader reader) at ... cs:line 171

<rss version="2.0">
  <channel>
    ...
    <pubDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</pubDate>
    <lastBuildDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</lastBuildDate> <-----exception
    ...
    <item>
      ...
      <pubDate>Wed, 24 Feb 2010 16:17:50 GMT+00:00</pubDate>
      <lastBuildDate>Wed, 24 Feb 2010 18:56:04 GMT+00:00</lastBuildDate>
    </item>
    ...
  </channel>
</rss>

Is there a better way to achieve this? Please help. Thanks.

A: 

I have the same problem and I can't find any workaround... I know that this next line parses fine, but I can't find where to apply it. I've tried overriding XmlTextReader.ReadString() without success...

DateTimeOffset.ParseExact(dateVal, "ddd, dd MMM yyyy HH':'mm':'ss 'GMT'zzz", new CultureInfo("en-US"))

HELP!

David
A: 

Google is still using Atom 0.3 formats - see the article "How to upgrade Atom 0.3 feeds on the fly with a custom XmlReader for use with WCF Syndication APIs" for a solution.

viperguynaz
+1  A: 

Here is my hacky workaround for reading Google News RSS feeds.

string xml;
using (WebClient webClient = new WebClient())
{
    xml = Encoding.UTF8.GetString(webClient.DownloadData(url));
}
xml = xml.Replace("+00:00", "");
byte[] bytes = System.Text.UTF8Encoding.ASCII.GetBytes(xml);  
XmlReader reader = XmlReader.Create(new MemoryStream(bytes));
SyndicationFeed feed = SyndicationFeed.Load(reader);
James Lawruk