views:

191

answers:

2

I am thinking of writing a daemon to loop through feeds and then add them into the database as ActiveRecord objects.

Firstly, one problem I am facing is that I cannot reliably retrieve the author/user of a story using the feed-normalizer gem. It appears that some times, it does not recognize the tag (I don't know if anyone else has faced this problem).

Secondly, I haven't seen anyone convert RSS feeds back into database entries. I need to do this as each entry will have associations with other ActiveRecord objects. I can't find any gems to do this specifically, but could I somehow hack something like acts_as_feed to do that?

A: 

SimpleRSS exposes a very simple API and works pretty well on most feeds. I recommend not looking at the implementation as its "parser" is a bunch of regexes (which is so wrong on so many levels), but it works well.

Daemons is a good gem for running it in the background.

If you are using active record, you should follow the instructions for using AR outside of rails and then inline define the model classes. This will cut down on bloat a bit.

RSS feeds are pretty inconsistent, this is the fall through we use

  date = i[:pubDate] || i[:published] || i[:updated]
  body = i[:description] || i[:content] || i[:summary] || ""
  url = i[:guid] || i[:link]

Also, from experience, make sure you try to rescue everything (and remember that timeouts are not caught by normal rescue). It sucks to have to constantly bounce RSS daemons that get bad data.

Ben Hughes
A: 

Don't use SimpleRSS. It won't decode HTML entities for you, and it occasionally ignores the structure of the feed.

I've found it easiest to parse the feed as XML with XMLSimple, but you can use any XML parser.

Horace Loeb