tags:

views:

252

answers:

6

Hi to all, i have a module (a single .py file, actually), with a class called HashedDir.

when i import the file and instanciate 2 instances of that class, when i check the object's fields they're always the same, even if the two objects should be different.

Eg:

 h1 = HashedDir('/path/to/dir')
 print h1.getList()['files'] # /path/to/dir
 h2 = HashedDir('some/other/path')
 print h1.getList()['files'] # some/other/path
 print h2.getList()['files'] # some/other/path

Any idea?

This is the class:

from os  import walk
from os import path
from hashlib import md5
import re

class HashedDir:
    """
    A list of files with associated md5 hashes generated retrieving thou
    a recursive walk in the directory tree starting from a provided root
    directory. Also stores the dirs in each dir
    """

    #  {'files': [
    #    ('/path/to/file1', '52bc309e11259af15e4623c7a0abc28c'),
    #    ('/path/to/file2', '52bc309e11259af15e4623c7a0abc28c'),
    #    ('/path/to/dir/file3', '52bc309e11259af15e4623c7a0abc28c')
    #   ],
    #   'dirs': ['/path/to/dir1', '/path/to/dir2']
    #  }
    fileList = {'files': [], 'dirs': []}
    ignoreList = []

    def __init__(self, rootDir, ignoreList=[]):
        """
        ignoreList is a list of regular expressions. If a file or a dir matches
        that regular expression, don't count it
        """
        self.ignoreList = ignoreList

        for dirpath, dirnames, filenames in walk(rootDir):
            for fileName in filenames:
                completeName = path.join(dirpath,fileName)
                hash = md5(open(completeName).read()).hexdigest()
                relativePath = self._relativePath(completeName, rootDir)
                if not self._toBeIgnored(relativePath):
                    self.fileList['files'].append((relativePath, hash))
            for dirName in dirnames:
                completeName = path.join(dirpath, dirName)
                relativePath = self._relativePath(completeName, rootDir)
                if not self._toBeIgnored(relativePath):
                    self.fileList['dirs'].append(relativePath)

    def _relativePath(self, path, base):
        return path.replace(base, '')

    def _toBeIgnored(self, path):
        for regex in self.ignoreList:
            if re.compile(regex).search(path) != None:
                return True
        return False

    def getList(self):
        return self.fileList

Thanks in advance

+6  A: 

Is it fileList you're talking about? You have it as a class variable, to make it an instance variable you need to do:

self.fileList = {'files': [], 'dirs': []}

in you __ init __ function.

Corey
Oh.. So declaring it after 'class' and manipulating it in __init__ isn't enough?
pistacchio
No, Python lets you refer to a classvariable as instance.classvariable
Corey
@pistacchio: there are no "declarations". You're assigning the variable name to an object like a dictionary or a list.
S.Lott
+1  A: 

If you declare your variables outside a class method, inside the body of the class, they will become 'class variables' and be common to all class instances. To get instance variables, declare them inside the init function and bind them to 'self', the handler for the current instance.

sykora
A: 

It might be useful if you could post a full working (or failing!) example.

If I do what I think is necessary (i.e., wrap this in Class HashedDir(object): and set self.fileList = {'files': [], 'dirs': []} inside init then it does seem to work.

Which items are you referring to as self.value? As per the previous post by sykora, you need to distinguish between code that is run for every instance (in init) and code that is common to all instances.

Andrew Jaffe
+10  A: 

There are two kinds of variables in a class:

  • class variables, defined at the class level, and common to all instances

  • instance variables, defined within a class method (usually __init__) and qualified by the instance (usually self.).

Example

class SomeClass( object ):
    classVariable = 0
    def __init__( self ):
        self.instanceVariable= 0

The variable named classVariable is part of the class, common to all instances. Because of the way Python does search, it's available as a member of self.classVariable, as well as SomeClass.classVariable.

The variable named instanceVariable is part of the instance (self.) and is unique to each instance.

Note. There's a third kind, global, but that's not what you're asking about.

S.Lott
`self.` does not absolutely qualify a variable as an instance member. Both `classVariable` and `instanceVariable` are typically prefixed with `self.` when referenced.
ΤΖΩΤΖΙΟΥ
+2  A: 

Things declared in a class block are class attributes, and class attributes are also accessible through the instance. (This principle, in fact, is how methods are bound.) Not only that, but default arguments for a function are only evaluated when the function is defined. So, to give an example illustrating these two points:

class C(object):
    list_a = []
    def __init__(self, list_b=[]):
        self.list_b = list_b

    def __str__(self):
        return '%r %r' % (self.list_a, self.list_b)

c1 = C()
c2 = C()
c2.list_a = []
c3 = C([])

c1.list_a.append(1)
c1.list_b.append(2)
print c1
print c2
print c3

The output for this is:

[1] [2]
[] [2]
[1] []

c1 and c3 share the same list_a because it's a class attribute; it's not shadowed by an instance attribute like it is on c2. c1 and c2 share the same list_b because there is only one list_b default in __init__; a new list isn't created every time the function is called, but passing in your own new list works.

Aaron Gallagher
+1  A: 

As others have pointed out, your problem is that fileList is a class variable which you are mutating.

However its worth noting another potential pitfall in your code that could lead to a similar problem (though it doesn't in your specific example):

def __init__(self, rootDir, ignoreList=[]):

Beware passing mutable parameters (such as this list) as default arguments. The list is only created once (when you're defining the __init__ function. This means that all instances of the class which have been constructed using the default will use the same list.

In your example, the list is never modified, so this will not have any repercussions, but if (as you do for fileList) you append to self.ignoreList, then this would affect all such instances, leading to a similar problem to the one you're seeing.

This is a very common beginner gotcha - to avoid it, it's a good idea to write such code as something like:

def __init__(self, rootDir, ignoreList=None):
    if ignoreList is None:
        ignoreList = []  # This will create a new empty list for every instance.
Brian