views:

476

answers:

2

It looks like http://portland.beerandblog.com/feed/atom/ is messed up (as are the 0.92 and 2.0 RSS feeds).

Universal Feed Parser (latest version from http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py?spec=svn295&r=295 ) doesn't see any dates.

    <title>Beer and Blog Portland</title>
    <atom:link href="http://portland.beerandblog.com/feed/" rel="self" type="application/rss+xml" />
    <link>http://portland.beerandblog.com&lt;/link&gt;
    <description>Bloggers helping bloggers over beers in Portland, Oregon</description>
    <pubDate>Fri, 19 Jun 2009 22:54:57 +0000</pubDate>
    <generator>http://wordpress.org/?v=2.7.1&lt;/generator&gt;
    <language>en</language>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
                    <item>
            <title>Widmer is sponsoring our beer for the After Party!!</title>
            <link>http://portland.beerandblog.com/2009/06/19/widmer-is-sponsoring-our-beer-for-the-after-party/&lt;/link&gt;
            <comments>http://portland.beerandblog.com/2009/06/19/widmer-is-sponsoring-our-beer-for-the-after-party/#comments&lt;/comments&gt;
            <pubDate>Fri, 19 Jun 2009 22:30:35 +0000</pubDate>
            <dc:creator>Justin Kistner</dc:creator>

            <category><![CDATA[beer]]></category>

I'm trying

        try:
            published = e.published_parsed
        except:
            try:
                published = e.updated_parsed
            except:
                published = e.created_parsed

and it's failing because I can't get a date.

Any thoughts on how to extract the date in a reasonable manner?

Thanks!

+1  A: 

Using a naked except may be masking a problem in your code. Assuming (I don't use feed parsers) that AttributeError is the specific exception that you should be checking for, try (accidental pun) this:

try:
    published = e.published_parsed
except AttributeError:
    try:
        published = e.updated_parsed
    except AttributeError:
        published = e.created_parsed

In any case, instead of "it's failing", please show the error message and traceback.

Edit I've download the latest release (i.e. not from svn) and followed the example in the docs with this result:

C:\feedparser>\python26\python
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> d = feedparser.parse('http://portland.beerandblog.com/feed/atom/')
>>> d.entries[0].updated
u'2009-06-19T22:54:57Z'
>>> d.entries[0].updated_parsed
time.struct_time(tm_year=2009, tm_mon=6, tm_mday=19, tm_hour=22, tm_min=54, tm_sec=57, tm_wday=4, tm_yday=170, tm_isdst=0)
>>> d.entries[0].title
u'Widmer is sponsoring our beer for the After Party!!'
>>> d.entries[0].published
u'2009-06-19T22:30:35Z'
>>> d.entries[0].published_parsed
time.struct_time(tm_year=2009, tm_mon=6, tm_mday=19, tm_hour=22, tm_min=30, tm_sec=35, tm_wday=4, tm_yday=170, tm_isdst=0)
>>>

Like I said, I'm not into RSS and Atoms and suchlike but it seems to be quite straightforward to me. Except that I don't understand where you are getting the <pubDate> tag and arpanet-style timestamps from; AFAICT that is not present in the raw source -- it has <published> and ISO timestamps:

>>> import urllib
>>> guff = urllib.urlopen('http://portland.beerandblog.com/feed/atom/').read()
>>> guff.find('pubDate')
-1
>>> guff.find('published')
1171
>>> guff[1160:1200]
'pdated>\n\t\t<published>2009-06-19T22:30:35'
>>>

What is your "e" in "e.published_parsed"? Consider showing the full story with accessing feedparser, as I did above.

John Machin
You're certainly right about specifying the error I'm looking for and I've gone ahead and done that now. Thanks.The problem that I'm having is that all 3 of them don't exist. From the documentation at http://www.feedparser.org/ I would have expected e.published_parsed to work and the others not so much. This blog is running WordPress 2.71 which is pretty current. What bothers me is that feedparser is clearly not seeing the various article "pubDate" that are in the feed. http://pastebin.com/m1c614fce has the log of what feedparser is seeing.
jdeibele
+1  A: 

Works for me:

>>> e = feedparser.parse('http://portland.beerandblog.com/feed/atom/')
>>> e.feed.date
u'2009-06-19T22:54:57Z'
>>> e.feed.date_parsed
(2009, 6, 19, 22, 54, 57, 4, 170, 0)
>>> e.feed.updated_parsed
(2009, 6, 19, 22, 54, 57, 4, 170, 0)

Maybe you're looking for e.updated_parsed where you should be looking for e.feed.updated_parsed instead?

Alex Martelli