views:

51

answers:

2

Question: How do I kill an instantiation or insure i'm creating a new instantiation of the python universal feedparser?


Info:

I'm working on a program right now that downloads and catalogs large numbers of blogs. It has worked well so for except for an unfortunate bug. My code is set up to take a list of blog urls and run them through a for loop. each run it picks a url and sends it down to a separate class which manages the downloading, extracting, and saving of the data to a file.

The first url works just fine. It downloads the entirety of the blog and saves it to a file. But the second blog it downloads will have all the data from the first one as well, I'm totally clueless as to why.


Code snippets:

class BlogHarvester:
  def __init__(self,folder):
    f = open(folder,'r')
    stop = folder[len(folder)-1]
    while stop != '/':
        folder = folder[0:len(folder)-1]
        stop = folder[len(folder)-1]
    blogs = []
    for line in f:
        blogs.append(line)

    for herf in blogs:
        blog = BlogParser(herf)
        sPath = ""
        uid = newguid()##returns random hash.
        sPath = uid
        sPath = sPath + " - " + blog.posts[0].author[1:5] + ".blog"
        print sPath
        blog.storeAsFile(sPath)

class BlogParser:
  def __init__(self, blogherf='null', path='null', posts = []):
    self.blogherf = blogherf

    self.blog = feedparser.parse(blogherf)
    self.path = path
    self.posts = posts
    if blogherf != 'null':
        self.makeList()
    elif path != 'null':
        self.loadFromFile()

class BlogPeices:
  def __init__(self,title,author,post,date,publisher,rights,comments):
    self.author = author
    self.title = title
    self.post = post
    self.date = date
    self.publisher = publisher
    self.rights = rights
    self.comments = comments

I included snippets I figured that would probably be useful. Sorry if there are any confusing artifacts. This program has been a pain in the butt.

+1  A: 

The problem is posts=[]. Default arguments are calculated at compile time, not runtime, so mutations to the object remain for the lifetime of the class. Instead use posts=None and test:

if posts is None:
  self.posts = []
Ignacio Vazquez-Abrams
not sure what you mean, could you elaborate on that a bit? Thanks.
Narcolapser
The code is compiled, and a list, a mutable object is created. This list is passed as posts each time the method is called. It is the same list every time.
Ignacio Vazquez-Abrams
Thanks. That worked. I gave the answer to Eld mostly because he referenced the python documentation. That was what really did it for me. But both of you guys were very helpful. ^_^
Narcolapser
A: 

As what Ignacio said, any mutations that happen to the default arguments in the function list will stay for the life of the class.

From http://docs.python.org/reference/compound_stmts.html#function-definitions

Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function.

But this brings up sort of a gotcha, you are modifying a reference... So you may be modifying a list that the consumer of the class that wasn't expected to be modified:

For example:

class A:
  def foo(self, x = [] ):
    x.append(1)
    self.x = x

a = A()
a.foo()
print a.x
# prints: [1]
a.foo()
print a.x
# prints: [1,1]   # !!!! Consumer would expect this to be [1]
y = [1,2,3]
a.foo(y)
print a.x
# prints: [1, 2, 3, 1]
print y
# prints: [1, 2, 3, 1]  #  !!!! My list was modified

If you were to copy it instead: (See http://docs.python.org/library/copy.html )

import copy
class A:
  def foo(self, x = [] ):
    x = copy.copy(x)
    x.append(1)
    self.x = x

a = A()
a.foo()
print a.x
# prints: [1]
a.foo()
print a.x
# prints: [1]   # !!! Much better =)
y = [1,2,3]
a.foo(y)
print a.x
# prints: [1, 2, 3, 1]
print y
# prints: [1, 2, 3]  #  !!!! My list is how I made it
Eld
but aren't I declaring a new blogparser class each for loop? or is it persisting some how?
Narcolapser
the default argument of the function foo persists for the life of the class(not object). def foo(self, x = [] ):so the reference to the instant of list "[]" is persisting (and referenced to each instant of your class).
Eld
That worked. At least it cleared one more bug out of the way. still not quite working, but I should be able to figure out the next one on my own. Thanks! this was very helpful.
Narcolapser