tags:

views:

312

answers:

4

According to the feedparser documentation, I can turn an RSS feed into a parsed object like this:

import feedparser
d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')

but I can't find anything showing how to go the other way; I'd like to be able do manipulate 'd' and then output the result as XML:

print d.toXML()

but there doesn't seem to be anything in feedparser for going in that direction. Am I going to have to loop through d's various elements, or is there a quicker way?

A: 
from xml.dom import minidom

doc= minidom.parse('./your/file.xml')
print doc.toxml()

The only problem is that it do not download feeds from the internet.

Andrea Ambu
It doesn't parse them in an RSS-aware way, either; so manipulating the feed will be much harder than with feedparser
Simon
A: 

As a method of making a feed, how about PyRSS2Gen? :)

I've not played with FeedParser, but have you tried just doing str(yourFeedParserObject)? I've often been suprised by various modules that have str methods to just output the object as text.

[Edit] Just tried the str() method and it doesn't work on this one. Worth a shot though ;-)

Jon Cage
How would I feed the output of feedparser to PyRSS2Gen? I tried r = PyRSS2Gen.RSS2(d)but that doesn't work.
Simon
+1  A: 

If you're looking to read in an XML feed, modify it and then output it again, there's a page on the main python wiki indicating that the RSS.py library might support what you're after (it reads most RSS and is able to output RSS 1.0). I've not looked at it in much detail though..

Jon Cage
+2  A: 

Appended is a not hugely-elegant, but working solution - it uses feedparser to parse the feed, you can then modify the entries, and it passes the data to PyRSS2Gen. It preserves most of the feed info (the important bits anyway, there are somethings that will need extra conversion, the parsed_feed['feed']['image'] element for example).

I put this together as part of a little feed-processing framework I'm fiddling about with.. It may be of some use (it's pretty short - should be less than 100 lines of code in total when done..)

#!/usr/bin/env python
import datetime

# http://www.feedparser.org/
import feedparser
# http://www.dalkescientific.com/Python/PyRSS2Gen.html
import PyRSS2Gen

# Get the data
parsed_feed = feedparser.parse('http://reddit.com/.rss')

# Modify the parsed_feed data here

items = [
    PyRSS2Gen.RSSItem(
        title = x.title,
        link = x.link,
        description = x.summary,
        guid = x.link,
        pubDate = datetime.datetime(
            x.modified_parsed[0],
            x.modified_parsed[1],
            x.modified_parsed[2],
            x.modified_parsed[3],
            x.modified_parsed[4],
            x.modified_parsed[5])
        )

    for x in parsed_feed.entries
]

# make the RSS2 object
# Try to grab the title, link, language etc from the orig feed

rss = PyRSS2Gen.RSS2(
    title = parsed_feed['feed'].get("title"),
    link = parsed_feed['feed'].get("link"),
    description = parsed_feed['feed'].get("description"),

    language = parsed_feed['feed'].get("language"),
    copyright = parsed_feed['feed'].get("copyright"),
    managingEditor = parsed_feed['feed'].get("managingEditor"),
    webMaster = parsed_feed['feed'].get("webMaster"),
    pubDate = parsed_feed['feed'].get("pubDate"),
    lastBuildDate = parsed_feed['feed'].get("lastBuildDate"),

    categories = parsed_feed['feed'].get("categories"),
    generator = parsed_feed['feed'].get("generator"),
    docs = parsed_feed['feed'].get("docs"),

    items = items
)


print rss.to_xml()
dbr