views:

2202

answers:

6

Hello,

I have just recently battled a bug in Python. It was one of those silly newbie bugs, but it got me thinking about the mechanisms of Python (I'm a long time C++ programmer, new to Python). I will lay out the buggy code and explain what I did to fix it, and then I have a couple of questions...

The scenario: I have a class called A, that has a dictionary data member, following is its code (this is simplification of course):

class A:
    dict1={}

    def add_stuff_to_1(self, k, v):
     self.dict1[k]=v

    def print_stuff(self):
     print(self.dict1)

The class using this code is class B:

class B:

    def do_something_with_a1(self):
     a_instance = A()
     a_instance.print_stuff()    
     a_instance.add_stuff_to_1('a', 1)
     a_instance.add_stuff_to_1('b', 2)    
     a_instance.print_stuff()

    def do_something_with_a2(self):
     a_instance = A()    
     a_instance.print_stuff()      
     a_instance.add_stuff_to_1('c', 1)
     a_instance.add_stuff_to_1('d', 2)    
     a_instance.print_stuff()

    def do_something_with_a3(self):
     a_instance = A()    
     a_instance.print_stuff()      
     a_instance.add_stuff_to_1('e', 1)
     a_instance.add_stuff_to_1('f', 2)    
     a_instance.print_stuff()

    def __init__(self):
     self.do_something_with_a1()
     print("---")
     self.do_something_with_a2()
     print("---")
     self.do_something_with_a3()

Notice that every call to do_something_with_aX() initializes a new "clean" instance of class A, and prints the dictionary before and after the addition.

The bug (in case you haven't figured it out yet):

>>> b_instance = B()
{}
{'a': 1, 'b': 2}
---
{'a': 1, 'b': 2}
{'a': 1, 'c': 1, 'b': 2, 'd': 2}
---
{'a': 1, 'c': 1, 'b': 2, 'd': 2}
{'a': 1, 'c': 1, 'b': 2, 'e': 1, 'd': 2, 'f': 2}

In the second initialization of class A, the dictionaries are not empty, but start with the contents of the last initialization, and so forth. I expected them to start "fresh".

What solves this "bug" is obviously adding:

self.dict1 = {}

In the __init__ constructor of class A. However, that made me wonder:

  1. What is the meaning of the "dict1 = {}" initialization at the point of dict1's declaration (first line in class A)? It is meaningless?
  2. What's the mechanism of instantiation that causes copying the reference from the last initialization?
  3. If I add "self.dict1 = {}" in the constructor (or any other data member), how does it not affect the dictionary member of previously initialized instances?


EDIT: Following the answers I now understand that by declaring a data member and not referring to it in the __init__ or somewhere else as self.dict1, I'm practically defining what's called in C++/Java a static data member. By calling it self.dict1 I'm making it "instance-bound".

+7  A: 

What you keep referring to as a bug is the documented, standard behavior of Python classes.

Declaring a dict outside of __init__ as you initially did is declaring a class-level variable. It is only created once at first, whenever you create new objects it will reuse this same dict. To create instance variables, you declare them with self in __init__; its as simple as that.

Paolo Bergantino
I did not say it's a Python bug, I said it was my bug... When a programmer does something against the documentation, it's still a bug (a silly one as I mentioned). My question is about the mechanisms that trigger this "documented, standard behavior" - I would like to understand what's under the hood. Thanks. And specifically my first question, I would like to understand if the declaration initialization is useless.
Roee Adler
It's not useless; if you want a variable to persist through instances you would use it. Your problem is that when you do x = A() only the __init__ code gets executed. The class variable persists.
Paolo Bergantino
Ok as long as a declaration is not in the __init__, it's the same as a "static" data member in C++, and the declaration in "__init__" changes it from being static to being instance-bound? Now it's clear, thanks.
Roee Adler
Pretty much. The documentation talks about its difference from C++ some, you should check it out.
Paolo Bergantino
Excellent and clear answer, thanks.
Roee Adler
@Rax to be technical, it dict1 does not have to be declared in __init__; it could be declared in any method...it's just that __init__ will always be executed (barring changes to __new__). The best thing to remember is that there is never any implied "self" in python. If you aren't explicitly assigning to or reading from self.member, you aren't touching the instance.
David Berger
A: 

When you access attribute of instance, say, self.foo, python will first find 'foo' in self.__dict__. If not found, python will find 'foo' in TheClass.__dict__

In your case, dict1 is of class A, not instance.

kcwu
A: 

Pythons class declarations are executed as a code block and any local variable definitions (of which function definitions are a special kind of) are stored in the constructed class instance. Due to the way attribute look up works in Python, if an attribute is not found on the instance the value on the class is used.

The is an interesting article about the class syntax on the history of Python blog.

Ants Aasma
A: 

If this is your code:

class ClassA:
    dict1 = {}
a = ClassA()

Then you probably expected this to happen inside Python:

class ClassA:
    __defaults__['dict1'] = {}

a = instance(ClassA)
# a bit of pseudo-code here:
for name, value in ClassA.__defaults__:
    a.<name> = value

As far as I can tell, that is what happens, except that a dict has its pointer copied, instead of the value, which is the default behaviour everywhere in Python. Look at this code:

a = {}
b = a
a['foo'] = 'bar'
print b
too much php
A: 

I'm going to bring the word "bug" back into the picture.

I'm writing a python script for the graphics application Blender, where each instance of my class is a wall of a building. Each wall class has a list of features (doors, windows and the end of the wall). All the instances of the wall class share the same ends of the wall.

I know I'm new to this particular language, so I might not know intra-cut details, however the lists inside the instance of a class should be just in that class and not shared with other instances.

So I'm going to be the one to say the word, bug, bug, bug!!!!!!!!

+1  A: 

@Matthew : Please review the difference between a class member and an object member in Object Oriented Programming. This problem happens because of the declaration of the original dict makes it a class member, and not an object member (as was the original poster's intent.) Consequently, it exists once for (is shared accross) all instances of the class (ie once for the class itself, as a member of the class object itself) so the behaviour is perfectly correct.

W.Prins