tags:

views:

1060

answers:

4

I am using rss2email for converting a number of RSS feeds into mail for easier consumption. That is, I was using it because it broke in a horrible way today: On every run, it only gives me this backtrace:

Traceback (most recent call last):
  File "/usr/share/rss2email/rss2email.py", line 740, in <module>
    elif action == "list": list()
  File "/usr/share/rss2email/rss2email.py", line 681, in list
    feeds, feedfileObject = load(lock=0)
  File "/usr/share/rss2email/rss2email.py", line 422, in load
    feeds = pickle.load(feedfileObject)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

The only helpful fact that I have been able to construct from this backtrace is that the file ~/.rss2email/feeds.dat in which rss2email keeps all its configuration and runtime state is somehow broken. Apparently, rss2email reads its state and dumps it back using cPickle on every run.

I have even found the line containing that 'sxOYAAuyzSx0WqN3BVPjE+6pgPU'string mentioned above in the giant (>12MB) feeds.dat file. To my untrained eye, the dump does not appear to be truncated or otherwise damaged.

What approaches could I try in order to reconstruct the file?

The Python version is 2.5.4 on a Debian/unstable system.

EDIT

Peter Gibson and J.F. Sebastian have suggested directly loading from the pickle file and I had tried that before. Apparently, a Feed class that is defined in rss2email.py is needed, so here's my script:

#!/usr/bin/python

import sys
# import pickle
import cPickle as pickle
sys.path.insert(0,"/usr/share/rss2email")
from rss2email import Feed

feedfile = open("feeds.dat", 'rb')
feeds = pickle.load(feedfile)

The "plain" pickle variant produces the following traceback:

Traceback (most recent call last):
  File "./r2e-rescue.py", line 8, in <module>
    feeds = pickle.load(feedfile)
  File "/usr/lib/python2.5/pickle.py", line 1370, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.5/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.5/pickle.py", line 1133, in load_reduce
    value = func(*args)
TypeError: 'str' object is not callable

The cPickle variant produces essentially the same thing as calling r2e itself:

Traceback (most recent call last):
  File "./r2e-rescue.py", line 10, in <module>
    feeds = pickle.load(feedfile)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

EDIT 2

Following J.F. Sebastian's suggestion around putting "printf debugging" into Feed.__setstate__ into my test script, these are the last few lines before Python bails out.

          u'http:/com/news.ars/post/20080924-everyone-declares-victory-in-smutfree-wireless-broadband-test.html': u'http:/com/news.ars/post/20080924-everyone-declares-victory-in-smutfree-wireless-broadband-test.html'},
 'to': None,
 'url': 'http://arstechnica.com/'}
Traceback (most recent call last):
  File "./r2e-rescue.py", line 23, in ?
    feeds = pickle.load(feedfile)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

The same thing happens on a Debian/etch box using python 2.4.4-2.

+3  A: 

Have you tried manually loading the feeds.dat file using both cPickle and pickle? If the output differs it might hint at the error.

Something like (from your home directory):

import cPickle, pickle
f = open('.rss2email/feeds.dat', 'r')
obj1 = cPickle.load(f)
obj2 = pickle.load(f)

(you might need to open in binary mode 'rb' if rss2email doesn't pickle in ascii).

Pete

Edit: The fact that cPickle and pickle give the same error suggests that the feeds.dat file is the problem. Probably a change in the Feed class between versions of rss2email as suggested in the Ubuntu bug J.F. Sebastian links to.

Peter Gibson
You should open pickles in binary mode "rb", as they are most likely written in binary mode.
Ted Dziuba
Mhm. rss2email itself does not open the file in binary mode. Maybe this is the source of the problem?
hillu
@Ted: On Debian 'rb' doesn't matter. Quote: "By default, the pickle data format uses a printable ASCII representation.". But you're right in general it is better to read "pickled" data in 'rb' mode (protocol >= 1)
J.F. Sebastian
Pete, I have tried your suggestion and updated my original post accordingly.
hillu
+2  A: 

Sounds like the internals of cPickle are getting tangled up. This thread (http://bytes.com/groups/python/565085-cpickle-problems) looks like it might have a clue..

Ned Batchelder
I had found several archives containing the thread after Googleing part of the error message, but it didn't give me much of a clue.
hillu
+2  A: 
  1. 'sxOYAAuyzSx0WqN3BVPjE+6pgPU' is most probably unrelated to the pickle's problem
  2. Post an error traceback for (to determine what class defines the attribute that can't be called (the one that leads to the TypeError):

    python -c "import pickle; pickle.load(open('feeds.dat'))"
    

EDIT:

Add the following to your code and run (redirect stderr to file then use 'tail -2' on it to print last 2 lines):

from pprint import pprint
def setstate(self, dict_):
    pprint(dict_, stream=sys.stderr, depth=None)
    self.__dict__.update(dict_)
Feed.__setstate__ = setstate

If the above doesn't yield an interesting output then use general troubleshooting tactics:

Confirm that 'feeds.dat' is the problem:

  • backup ~/.rss2email directory
  • install rss2email into virtualenv/pip sandbox (or use zc.buildout) to isolate the environment (make sure you are using feedparser.py from the trunk).
  • add couple of feeds, add feeds until 'feeds.dat' size is greater than the current. Run some tests.
  • try old 'feeds.dat'
  • try new 'feeds.dat' on existing rss2email installation

See r2e bails out with TypeError bug on Ubuntu.

J.F. Sebastian
'sxOYAAuyzSx0WqN3BVPjE+6pgPU' was part of the error message, so that's where I started looking. I have tried your suggestion, and updated my original post accordingly.
hillu
the suggested troubleshooting tactic won't work for me because the feeds.dat file is larger than 12 MB. It has been accumulating for more than 2 years.
hillu
thanks for pointing me to the Ubuntu BTS. Seems to be the same issue.
hillu
updated my original posting
hillu
It is not hard at all to populate feeds.dat programmatically just run 'r2e add http://stackoverflow.com/feeds/question/$id' (there are more than 100000 question on stackoverflow.
J.F. Sebastian
+3  A: 

How I solved my problem

A Perl port of pickle.py

Following J.F. Sebastian's comment about how simple the the pickle format is, I went out to port parts of pickle.py to Perl. A couple of quick regular expressions would have been a faster way to access my data, but I felt that the hack value and an opportunity to learn more about Python would be be worth it. Plus, I still feel much more comfortable using (and debugging code in) Perl than Python.

Most of the porting effort (simple types, tuples, lists, dictionaries) went very straightforward. Perl's and Python's different notions of classes and objects has been the only issue so far where a bit more than simple translation of idioms was needed. The result is a module called Pickle::Parse which after a bit of polishing will be published on CPAN.

A module called Python::Serialise::Pickle existed on CPAN, but I found its parsing capabilities lacking: It spews debugging output all over the place and doesn't seem to support classes/objects.

Parsing, transforming data, detecting actual errors in the stream

Based upon Pickle::Parse, I tried to parse the feeds.dat file. After a few iteration of fixing trivial bugs in my parsing code, I got an error message that was strikingly similar to pickle.py's original object not callable error message:

Can't use string ("sxOYAAuyzSx0WqN3BVPjE+6pgPU") as a subroutine
ref while "strict refs" in use at lib/Pickle/Parse.pm line 489,
<STDIN> line 187102.

Ha! Now we're at a point where it's quite likely that the actual data stream is broken. Plus, we get an idea where it is broken.

It turned out that the first line of the following sequence was wrong:

g7724
((I2009
I3
I19
I1
I19
I31
I3
I78
I0
t(dtRp62457

Position 7724 in the "memo" pointed to that string "sxOYAAuyzSx0WqN3BVPjE+6pgPU". From similar records earlier in the stream, it was clear that a time.struct_time object was needed instead. All later records shared this wrong pointer. With a simple search/replace operation, it was trivial to fix this.

I find it ironic that I found the source of the error by accident through Perl's feature that tells the user its position in the input data stream when it dies.

Conclusion

  1. I will move away from rss2email as soon as I find time to automatically transform its pickled configuration/state mess to another tool's format.
  2. pickle.py needs more meaningful error messages that tell the user about the position of the data stream (not the poision in its own code) where things go wrong.
  3. Porting parts pickle.py to Perl was fun and, in the end, rewarding.
hillu