views:

774

answers:

3

The following seems strange.. Basically, the somedata attribute seems shared between all the classes that inherited from the_base_class.

class the_base_class:
    somedata = {}
    somedata['was_false_in_base'] = False


class subclassthing(the_base_class):
    def __init__(self):
            print self.somedata


first = subclassthing()
{'was_false_in_base': False}
first.somedata['was_false_in_base'] = True
second = subclassthing()
{'was_false_in_base': True}
>>> del first
>>> del second
>>> third = subclassthing()
{'was_false_in_base': True}

Defining self.somedata in the __init__ function is obviously the correct way to get around this (so each class has it's own somedata dict) - but when is such behavior desirable?

+11  A: 

You are right, somedata is shared between all instances of the class and it's subclasses, because it is created at class definition time. The lines

somedata = {}
somedata['was_false_in_base'] = False

are executed when the class is defined, i.e. when the interpreter encounters the class statement - not when the instance is created (think static initializer blocks in Java). If an attribute does not exist in a class instance, the class object is checked for the attribute.

At class definition time, you can run arbritrary code, like this:

 import sys
 class Test(object):
     if sys.platform == "linux2":
         def hello(self):
              print "Hello Linux"
     else:
         def hello(self):
              print "Hello ~Linux"

On a Linux system, Test().hello() will print Hello Linux, on all other systems the other string will be printed.

In constrast, objects in __init__ are created at instantiation time and belong to the instance only (when they are assigned to self):

class Test(object):
    def __init__(self):
        self.inst_var = [1, 2, 3]

Objects defined on a class object rather than instance can be useful in many cases. For instance, you might want to cache instances of your class, so that instances with the same member values can be shared (assuming they are supposed to be immutable):

class SomeClass(object):
    __instances__ = {}

    def __new__(cls, v1, v2, v3):
        try:
            return cls.__insts__[(v1, v2, v3)]
        except KeyError:
            return cls.__insts__.setdefault(
               (v1, v2, v3), 
               object.__new__(cls, v1, v2, v3))

Mostly, I use data in class bodies in conjunction with metaclasses or generic factory methods.

Torsten Marek
How do you declare an instance var then?
OscarRyz
I've added an example in my answer.
Torsten Marek
Is there a difference between simple data and complex? as TimB says?
OscarRyz
@Oscar: ask a new question:) Otherwise, see my comment on the other answer.
Torsten Marek
The class attribute isn't referenced in the instance's __dict__. Instead, python's method for lookup is first to check the instance variables, and if no variable was found, check the class's variables (and then base classes). Anything in the __dict__ is explicitely set on the instance.
Brian
Right, thanks, I corrected it. I actually checked this yesterday, but I checked wrongly.
Torsten Marek
+4  A: 

Note that part of the behaviour you're seeing is due to somedata being a dict, as opposed to a simple data type such as a boolean.

For instance, see this different example, which behaves differently although it's very similar:

class the_base_class:
    somedata = False

class subclassthing(the_base_class):
    def __init__(self):
        print self.somedata


>>> first = subclassthing()
False
>>> first.somedata = True
>>> print first.somedata
True
>>> second = subclassthing()
False
>>> print first.somedata
True
>>> del first
>>> del second
>>> third = subclassthing()
False

The reason this example behaves differently from the one given in the question is because here first.somedata is being given a new value (the object True), whereas in the first example the dict object referenced by first.somedata (and also by the other subclass instances) is being modified.

See Torsten Marek's comment to this answer for further clarification.

TimB
Are you saying that simple data types are not shared while complex does?
OscarRyz
No, they are shared, every data type in Python is a reference type, even integers and booleans etc.The reason is that first.somedata does not contain the value False/True, it references the object False/True. If it is reassigned, it simply references a different object.
Torsten Marek
@Oscar, I suspect the difference is due to the nature of references. When somedata is a dictionary then first and second have their own reference but each reference points to the same dictionary in memory. When somedate is boolean they still get their own copies, but they modify the bool directly.
Jason Dagit
+2  A: 

I think the easiest way to understand this (so that you can predict behavior) is to realize that your somedata is an attribute of the class and not the instance of that class if you define it that way.

There is really only one somedata at all times because in your example you didn't assign to that name but used it to look up a dict and then assign an item (key, value) to it. It's a gotcha that is a consequence of how the python interpreter works and can be confusing at first.

Toni Ruža