tags:

views:

275

answers:

2

Having great luck working with single-source feed parsing in Universal Feed Parser, but now I need to run multiple feeds through it and generate chronologically interleaved output (not RSS). Seems like I'll need to iterate through URLs and stuff every entry into a list of dictionaries, then sort that by the entry timestamps and take a slice off the top. That seems do-able, but pretty expensive resource-wise (I'll cache it aggressively for that reason).

Just wondering if there's an easier way - an existing library that works with feedparser to do simple aggregation, for example. Sample code? Gotchas or warnings? Thanks.

+1  A: 

You could throw the feeds into a database and then generate a new feed from this database.

Consider looking into two feedparser-based RSS aggregators: Planet Feed Aggregator and FeedJack (Django based), or at least how they solve this problem.

John Paulett
I had actually already looked at Planet and FeedJack, but the problem was that I already had models with rss_url fields on them that I needed to work with, whereas those two assume they're the basis for a whole site (i.e. they're not very pluggable). Instead I used the aggregator used by the djangoproject.com site itself: http://code.djangoproject.com/browser/djangoproject.com/django_website/apps/aggregator , which comes with a feed_updater.py wrapper around feedparser. That solved the problem neatly, and also let me do ORM queries against sites in certain categories, etc.
shacker
I wonder how hard it would be to make FeedJack more pluggable?
John Paulett
It is said that feedjack allows you to view/download historical feeds. I downloaded feedjack and went through the source.(I didn't get it working after I plugged it into my project) But I didn't find a place where I can view how old feeds are checked out. Can you advice me on what I am missing?
Maddy
+1  A: 

Here is already suggestion to store data in the database, e.g. bsddb.btopen() or any RDBMS.

Take a look at heapq.merge() and bisect.insort() or use one of B-tree implementations if you'd like to merge data in memory.

Denis Otkidach
This is good suggestion. Here are some links http://www.doughellmann.com/PyMOTW/heapq/index.html#module-heapq and http://www.doughellmann.com/PyMOTW/bisect/index.html#module-bisect
Jason Christa