tags:

views:

83

answers:

3
+2  Q: 

RSS screen scraper

Can anyone point me towards a ready made RSS screen scraper, preferably in Python in order to get full text RSS feeds?

+1  A: 

There's a good list of them here, which mentions Feed Parser, which you use like this:

import feedparser

python_wiki_rss_url = "http://www.python.org/cgi-bin/moinmoin/" \
                      "RecentChanges?action=rss_rc"

feed = feedparser.parse( python_wiki_rss_url )

You can then do things like:

for item in feed["items"]:
    print item["title"]
Dominic Rodger
+1 for feedparser
S.Mark
He was asking for a partial to full feed converter in python, not a parser.
Recursion
+1  A: 

feedparser.org is great

S.Mark
+1 - and to you sir, think you got there a bit before me (my revision history doesn't show me posting the first link, reading it, seeing Feed Parser introduced there, and incorporating that into my post).
Dominic Rodger
He was asking for a partial to full feed converter in python, not a parser.
Recursion
probably you're right, but it will be a HTML scraping instead of RSS and Its completely **site dependent**, could even break site's policy, so let's use available RSS feeds :-)
S.Mark
A: 

Sorry but it doesn't exist in python, though they do in php. You are more then welcome to use and improve the one I made named scraped. Though it does not do all sites, it is a recipe based system that currently only handles the NYT, WSJ and the Economist. I am working on an all inclusive algorithm, but its a major undertaking. It includes a ton of analysis to the different types of html and xml. Even the 3 sites mentioned above, have vastly different algorithms on how to scrape their sites WSJ being the most complex by far. They screw their HTML up with so much useless crap, mainly to just stop you.

Here is the program I was talking about, it requires lxml but it explains everything in the readme. It reads the config files, parses partial rss feeds, takes links and then scrapes those links, formulating in the end a RSS 2.0 xml file. Which I mainly convert into a ebook for my kindle. I utilize lxml, BeautifulSoup and feedparser.

http://tinyurl.com/yh3s9pa

You can also look at the calibre project, which uses a similar method to the way I do it, on recipes.

Recursion