views:

426

answers:

1

I'm writing a Stack Overflow API wrapper, currently at http://soapidotnet.googlecode.com/. I have a few questions about parsing SO RSS feeds.

I've chosen to use RSS.NET to parse the RSS, but I have a few questions about my code (which I have provided further down in this post).


My Questions:

First of all, am I parsing those attributes correctly? I have a class named Question, which has those properties.

Next, how can I parse the <re:rank> RSS property (used for # of votes)? I'm not sure how RSS.NET lets us do that. As far as I understand, it's a element with a custom namespace.

Finally, do I have to add all the properties manually, like currently in my code? Is their some sort of deserialization that I can use?


Code:

Below is my current code for parsing recent question feeds:

   /// <summary>
    /// Utilises recent question feeds to obtain recently updated questions on a certain site.
    /// </summary>
    /// <param name="site">Trilogy site in question.</param>
    /// <returns>A list of objects of type Question, which represents the recent questions on a trilogy site.</returns>
    public static List<Question> GetRecentQuestions(TrilogySite site)
    {
        List<Question> RecentQuestions = new List<Question>();
        RssFeed feed = RssFeed.Load(string.Format("http://{0}.com/feeds",GetSiteUrl(site)));
        RssChannel channel = (RssChannel)feed.Channels[0];
        foreach (RssItem item in channel.Items)
        {
            Question toadd = new Question();
            foreach(RssCategory cat in item.Categories)
            {
                toadd.Categories.Add(cat.Name);
            }
            toadd.Author = item.Author;
            toadd.CreatedDate = ConvertToUnixTimestamp(item.PubDate).ToString();
            toadd.Id = item.Link.Url.ToString();
            toadd.Link = item.Link.Url.ToString();
            toadd.Summary = item.Description;

            //TODO: OTHER PROPERTIES
            RecentQuestions.Add(toadd);
        }
        return RecentQuestions;
    }

Here is the code of that SO RSS feed:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:re="http://purl.org/atompub/rank/1.0"&gt; 
    <title type="text">Top Questions - Stack Overflow</title> 
    <link rel="self" href="http://stackoverflow.com/feeds" type="application/atom+xml" /> 
    <link rel="alternate" href="http://stackoverflow.com/questions" type="text/html" /> 
    <subtitle>most recent 30 from stackoverflow.com</subtitle> 
    <updated>2009-11-28T19:26:49Z</updated> 
    <id>http://stackoverflow.com/feeds&lt;/id&gt; 
    <creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5/rdf&lt;/creativeCommons:license&gt; 

    <entry> 
        <id>http://stackoverflow.com/questions/1813483/averaging-angles-again&lt;/id&gt; 
        <re:rank scheme="http://stackoverflow.com"&gt;0&lt;/re:rank&gt; 
        <title type="text">Averaging angles... Again</title> 
        <category scheme="http://stackoverflow.com/feeds/tags" term="algorithm"/><category scheme="http://stackoverflow.com/feeds/tags" term="math"/><category scheme="http://stackoverflow.com/feeds/tags" term="geometry"/><category scheme="http://stackoverflow.com/feeds/tags" term="calculation"/> 
        <author><name>Lior Kogan</name></author> 
        <link rel="alternate" href="http://stackoverflow.com/questions/1813483/averaging-angles-again" /> 
        <published>2009-11-28T19:19:13Z</published> 
        <updated>2009-11-28T19:26:39Z</updated> 
        <summary type="html"> 
            &lt;p&gt;I want to calculate the average of a set of angles.&lt;/p&gt;

&lt;p&gt;I know it has been discussed before (several times). The accepted answer was &lt;strong&gt;Compute unit vectors from the angles and take the angle of their average&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However this answer defines the average in a non intuitive way. The average of 0, 0 and 90 will be &lt;strong&gt;atan( (sin(0)+sin(0)+sin(90)) / (cos(0)+cos(0)+cos(90)) ) = atan(1/2)= 26.56 deg&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;I would expect the average of 0, 0 and 90 to be 30 degrees.&lt;/p&gt;

&lt;p&gt;So I think it is fair to ask the question again: How would you calculate the average, so such examples will give the intuitive expected answer.&lt;/p&gt;

        </summary> 
    </entry>

etc.

Here is my Question class, if it will help:

    /// <summary>
    /// Represents a question.
    /// </summary>
    public class Question : Post //TODO: Have Question and Answer derive from Post
    {

        /// <summary>
        /// # of favorites.
        /// </summary>
        public double FavCount { get; set; }

        /// <summary>
        /// # of answers.
        /// </summary>
        public double AnswerCount { get; set; }

        /// <summary>
        /// Tags.
        /// </summary>
        public string Tags { get; set; }

    }


/// <summary>
    /// Represents a post on Stack Overflow (question, answer, or comment).
    /// </summary>
    public class Post
    {
        /// <summary>
        /// Id (link)
        /// </summary>
        public string Id { get; set; }
        /// <summary>
        /// Number of votes.
        /// </summary>
        public double VoteCount { get; set; }
        /// <summary>
        /// Number of views.
        /// </summary>
        public double ViewCount { get; set; }
        /// <summary>
        /// Title.
        /// </summary>
        public string Title { get; set; }
        /// <summary>
        /// Created date of the post (expressed as a Unix timestamp)
        /// </summary>
        public string CreatedDate
        {

            get
            {
                return CreatedDate;
            }
            set
            {
                CreatedDate = value;
                dtCreatedDate = StackOverflow.ConvertFromUnixTimestamp(StackOverflow.ExtractTimestampFromJsonTime(value));

            }

        }
        /// <summary>
        /// Created date of the post (expressed as a DateTime)
        /// </summary>
        public DateTime dtCreatedDate { get; set; }
        /// <summary>
        /// Last edit date of the post (expressed as a Unix timestamp)
        /// </summary>
        public string LastEditDate
        {

            get
            {
                return LastEditDate;
            }
            set
            {
                LastEditDate = value;
                dtLastEditDate = StackOverflow.ConvertFromUnixTimestamp(StackOverflow.ExtractTimestampFromJsonTime(value));

            }

        }
        /// <summary>
        /// Last edit date of the post (expressed as a DateTime)
        /// </summary>
        public DateTime dtLastEditDate { get; set; }
        /// <summary>
        /// Author of the post.
        /// </summary>
        public string Author { get; set; }
        /// <summary>
        /// HTML of the post.
        /// </summary>
        public string Summary { get; set; }
        /// <summary>
        /// URL of the post.
        /// </summary>
        public string Link { get; set; }
        /// <summary>
        /// RSS Categories (or tags) of the post.
        /// </summary>
        public List<string> Categories { get; set; }

    }

Thanks in advance! Btw, please contribute to the library project! :)

+10  A: 

Firstly, I've never used RSS.NET but I wondered whether you realised that the .NET framework has it's own RSS api within the System.ServiceModel.Syncidation namespace. The SyndicationFeed class is the starting point for this.

To address your question I've written a little sample that takes the feed for this question and writes out the title, author, id and the rank (the extension element you're interested) to the console. This should help show you how simple this API is and how to access the rank.

// load the raw feed
using (var xmlr = XmlReader.Create("http://stackoverflow.com/feeds/question/1813559"))
{
    // get the items within a feed
    var feedItems = SyndicationFeed
                        .Load(xmlr)
                        .GetRss20Formatter()
                        .Feed
                        .Items;

    // print out details about each item in the feed
    foreach (var item in feedItems)
    {
        Console.WriteLine("Title: {0}", item.Title.Text); 
        Console.WriteLine("Author: {0}", item.Authors.First().Name);
        Console.WriteLine("Id: {0}", item.Id);

        // the extensions assume that there can be more than one value, so get
        // the first or default value (default == 0)
        int rank = item.ElementExtensions
                        .ReadElementExtensions<int>("rank", "http://purl.org/atompub/rank/1.0")
                        .FirstOrDefault();

        Console.WriteLine("Rank: {0}", rank);  
    }
}

The above code results in the following being written to the console...

Title: .NET/C#: Using RSS.NET with Stack Overflow Feeds: How To Handle Special Properties of RSS Items

Author: Maxim Z.

Id: http://stackoverflow.com/questions/1813559/net-c-using-rss-net-with-stack-overflow-feeds-how-to-handle-special-propertie

Rank: 0

For more information about the SyndicationFeed class go here...

http://msdn.microsoft.com/en-us/library/system.servicemodel.syndication.syndicationfeed.aspx

For some examples of reading and writing extended values from RSS feeds go here...

http://msdn.microsoft.com/en-us/library/bb943475.aspx

With regard to creating your Question instances I'm not sure there's a quick win with serialization. I would have probably written your code something more like this...

var questions = from item in feedItems
                select
                    new Question
                        {
                            Title = item.Title.Text,
                            Author = item.Authors.First().Name,
                            Id = item.Id,
                            Rank = item.ElementExtensions.ReadElementExtensions<int>(
                                "rank", "http://purl.org/atompub/rank/1.0").FirstOrDefault()
                        };

... but it's pretty much doing the same thing.

The stuff above requires .NET 3.5 libraries be installed. The following doesn't, but requires C# 3.5 (which will create assemblies that target .NET 2.0)

One thing I would suggest you consider - don't create custom types but, instead, write extension methods for the SyndicationItem type. If you let your users deal with SyndicationType (a type that is supported, understood, documented etc) but add extension methods to make accessing SO specific properties easier then you make the user's life easier and they can always fall back to the SyndicationItem API when your SO extentions don't do what they want. So, for example, if you wrote this extension method...

public static class SOExtensions
{
    public static int Rank(this SyndicationItem item)
    {
        return item.ElementExtensions
                   .ReadElementExtensions<int>("rank", "http://purl.org/atompub/rank/1.0")
                   .FirstOrDefault();
    }
}

... you could access the Rank of a SyndicationItem like this...

Console.WriteLine("Rank: {0}", item.Rank());

... and when SO add some other extention property to the feed that you've not catered for the user of your API can fall back to looking at the ElementExtensions collection.

One final update...

I've not used the Rss.NET library, but I've read through the online docs. From an initial reading of these documents I would suggest that there isn't a way of getting to the extension element that you're trying to access (the Rank of the item). If the RSS.NET API allowes access to the xml for a given RssItem (and I'm not sure that it does) then you could have employed the extension method mechanism to augment the RssItem class.

I find the SyndicationFeed API very powerful and very easy to get to grips with, so if using .NET 3.5 is an option for you then I'd go in that direction.

Martin Peck
+1 But I have a feeling he needs a 2.0 solution
Josh Stodola
Indeed, the project file contains `<TargetFrameworkVersion>v2.0</TargetFrameworkVersion>`
Thomas Levesque
Then he as several good reasons to upgrade :-)
Martin Peck
The second part of my answer, where I suggest an API based upon extension methods instead of custom types, does not require .NET 3.5, so hopefully that's of some use even if there is a hard requirement for .NET 2.0.
Martin Peck
+1 for providing a thorough answer to a question. It's refreshing to see someone actually work through a problem rather than just provide a quick response.
Ben McCormack
So, do you think I should upgrade this library to .NET 3.5 for this and other stuff, while risking having some people not be able to use it because it's 3.5?
Maxim Zaslavsky
Given that .NET 3.5 is shipped right now, and downloadable, I wonder how big that risk is. Presumably the users of your API (which is still under development) would be using it in new development projects (i.e. ones starting now) and are likely to be in the group of devs who are also using ASP.NET MVC and other .NET 3.5 technologies. I'm possibly not the best person to ask (I'm an MSFT dev and been using .NET 3.5 for a while now) but you should pole your current users and see if this is that big of a limitation.
Martin Peck
Thanks so much! I'm going to test this right now, change the project to .NET 3.5, and award you the bounty for such a thorough answer!
Maxim Zaslavsky
Glad to help. Thanks and goodluck with the project!
Martin Peck