views:

39

answers:

1

Hi All:

I have a strange problem in Python 2.6.5 with Numpy. I assign a numpy array, then equate a new variable to it. When I perform any operation to the new array, the original's values also change. Why is that? Please see the example below. Kindly enlighten me, as I'm fairly new to Python, and programming in general.

-Sujan

>>> import numpy as np
>>> a = np.array([[1,2],[3,4]])
>>> b = a
>>> b
array([[1, 2],
       [3, 4]])
>>> c = a
>>> c
array([[1, 2],
       [3, 4]])
>>> c[:,1] = c[:,1] + 5
>>> c

array([[1, 7],
       [3, 9]])
>>> b
array([[1, 7],
       [3, 9]])
>>> a
array([[1, 7],
       [3, 9]])
+10  A: 

That's actually not a problem at all; it's the way arrays (and other objects) work in Python.

Think about it like this: the array you created in your code example is an object that sits at some location in memory. But you can't use it in your program by telling Python where in memory to go look for it; you have to give it a name. When you write

a = np.array([[1,2],[3,4]])

you are both creating the array and creating a name, a, that refers to it. From that point on, Python knows that a refers to "memory address 0x123674283" (or whatever). There's an internal table in the Python runtime (called the "symbol table" if I remember correctly) that contains all this information, so after the above line of Python code runs, this table would contain

...,
'a' : 0x123674283,
...

When you assign the value of one variable to another, like

b = a

Python doesn't copy the whole array, because if it were a big array, it would take a long time. Instead, it goes to the symbol table and copies the memory address for a to a new row in the table for b. So you wind up with

...,
'a' : 0x123674283,
...,
'b' : 0x123674283,
...

So you see, a and b are actually referring to the same location in memory, i.e. the same object. Any changes you make to one will be reflected in the other, since they're just two names for the same thing.

If you want to actually make a copy of the array, you have to call a method to do that explicitly. Numpy arrays have a copy method which you can use for just this purpose. So if you write

b = a.copy()

then Python will first actually make a copy of the array - that is, it sets aside a new region of memory, let's say at address 0x123904381, then goes to memory address 0x123674283 and copies all the values of the array from the latter section of memory to the former. So you have the same content sitting in two different places in memory.

...,
'a' : 0x123674283,
...,
'b' : 0x123904381,
...

Now, when you change one of the elements of b, that change won't show up in a, since a and b no longer refer to the same section of the computer's memory. Since there are two independent copies of the array data, you can change one without affecting the other.

David Zaslavsky
Thank you so very much for the excellent description. Needless to say, that fixed the problem, but I really appreciate your taking the time to explain things. You have enlightened me!
Sujan