views:

979

answers:

8

Hi stack overflow, I have a problem with some numpy stuff. I need a numpy array to behave in an unusual manner by returning a slice as a view of the data I have sliced, not a copy. So heres an example of what I want to do:

Say we have a simple array like this:

a = array([1, 0, 0, 0])

I would like to update consecutive entries in the array (moving left to right) with the previous entry from the array, using syntax like this:

a[1:] = a[0:3]

This would get the following result:

a = array([1, 1, 1, 1])

Or something like this:

a[1:] = 2*a[:3]
# a = [1,2,4,8]

To illustrate further I want the following kind of behaviour:

for i in range(len(a)):
    if i == 0 or i+1 == len(a): continue
    a[i+1] = a[i]

Except I want the speed of numpy.

The default behavior of numpy is to take a copy of the slice, so what I actually get is this:

a = array([1, 1, 0, 0])

I already have this array as a subclass of the ndarray, so I can make further changes to it if need be, I just need the slice on the right hand side to be continually updated as it updates the slice on the left hand side.

Am I dreaming or is this magic possible?

Update: This is all because I am trying to use Gauss-Seidel iteration to solve a linear algebra problem, more or less. It is a special case involving harmonic functions, I was trying to avoid going into this because its really not necessary and likely to confuse things further, but here goes.

The algorithm is this:

while not converged:
    for i in range(len(u[:,0])):
        for j in range(len(u[0,:])):
            # skip over boundary entries, i,j == 0 or len(u)
            u[i,j] = 0.25*(u[i-1,j] + u[i+1,j] + u[i, j-1] + u[i,j+1])

Right? But you can do this two ways, Jacobi involves updating each element with its neighbours without considering updates you have already made until the while loop cycles, to do it in loops you would copy the array then update one array from the copied array. However Gauss-Seidel uses information you have already updated for each of the i-1 and j-1 entries, thus no need for a copy, the loop should essentially 'know' since the array has been re-evaluated after each single element update. That is to say, every time we call up an entry like u[i-1,j] or u[i,j-1] the information calculated in the previous loop will be there.

I want to replace this slow and ugly nested loop situation with one nice clean line of code using numpy slicing:

u[1:-1,1:-1] = 0.25(u[:-2,1:-1] + u[2:,1:-1] + u[1:-1,:-2] + u[1:-1,2:])

But the result is Jacobi iteration because when you take a slice: u[:,-2,1:-1] you copy the data, thus the slice is not aware of any updates made. Now numpy still loops right? Its not parallel its just a faster way to loop that looks like a parallel operation in python. I want to exploit this behaviour by sort of hacking numpy to return a pointer instead of a copy when I take a slice. Right? Then every time numpy loops, that slice will 'update' or really just replicate whatever happened in the update. To do this I need slices on both sides of the array to be pointers.

Anyway if there is some really really clever person out there that awesome, but I've pretty much resigned myself to believing the only answer is to loop in C.

+1  A: 

Just use a loop. I can't immediately think of any way to make the slice operator behave the way you're saying you want it to, except maybe by subclassing numpy's array and overriding the appropriate method with some sort of Python voodoo... but more importantly, the idea that a[1:] = a[0:3] should copy the first value of a into the next three slots seems completely nonsensical to me. I imagine that it could easily confuse anyone else who looks at your code (at least the first few times).

David Zaslavsky
No no no, its not copying the first value into the next three, its updating each consecutive entry with data from the previous entry. However each time it updates I want it to be aware of the previous update. Yes I can loop, but its extremely slow and clumsy for the purpose I have in mind. But heres what it would look like in a loop:for i in a:
daver
*Dammit sorry i hit tab and it updated my comment... the code:for i in a: a[i+1] = a[i]The point is this is for a 2 dimensional array and is a specific numeric algorithm that I need to implement.
daver
why not just a[1:4]=a[0]?
gnibbler
unknown: You really don't want the array copy to work that way. How would you copy parts of the array if it worked that way?
Aaron Digulla
gnibbler: That completely misses the point of the algorithm, this is a simple example, what I am doing is Gauss-Seidel iteration, which infers information about a location in a matrix by using data that has already been inferred in previous entries.Deep in the machinery of numpy, as I understand it, it performs this loop. However, it loops over the copy of the original data in the slice. I want it to loop over the slice and update the slice as it goes. Maybe I could make it clearer like this: a[1:] = 2*a[0:3], with the expected result being: a = [1,2,4,8].
daver
Oh yeah, and no, I do not want my array copy to work like that, I want my array slice to not be a copy, I want my array slice to be a view of the data contained in the array.
daver
Please fix the question. Do no correct your question in the comments on an answer. Please update the question with this revised description of the problem.
S.Lott
One of the main advantages to numpy is the ability to avoid expensive python iterations. (aka broadcasting). The question is perfectly valid and a reasonable thing to expect numpy to do.
bpowah
A: 

It must have something to do with assigning a slice. Operators, however, as you may already know, do follow your expected behavior:

>>> a = numpy.array([1,0,0,0])
>>> a[1:]+=a[:3]
>>> a
array([1, 1, 1, 1])

If you already have zeros in your real-world problem where your example does, then this solves it. Otherwise, at added cost, set them to zero either by multiplying by zero or assigning to zero, (whichever is faster)

edit: I had another thought. You may prefer this:

numpy.put(a,[1,2,3],a[:3])
bpowah
A: 

It is not the correct logic. I'll try to use letters to explain it.

Image array = abcd with a,b,c,d as elements.
Now, array[1:] means from the element in position 1 (starting from 0) on.
In this case:bcd and array[0:3] means from the character in position 0 up to the third character (the one in position 3-1) in this case: 'abc'.

Writing something like:
array[1:] = array[0:3]

means: replace bcd with abc

To obtain the output you want, now in python, you should use something like:

a[1:] = a[0]
Andrea Ambu
The OP is trying to do something more sophisticated than that. The goal is to update elements of a slice based on element values that were updated in the current operation. It's hard to comprehend, but it's much faster than traditional python loops.
bpowah
I've just read his comment above. The question is pretty unclear so. The fist part of my post explain why it does not work as the OP thinks. I'm waiting for a question improvement to try to solve his problem. It is not clear as he would want to update it.
Andrea Ambu
Yes, the original question was vague and oversimplified. Found clarification in a comment. I wonder why people are afraid to edit original posts?
bpowah
I did edit it, it should be very clear now. Sorry, this is my first post, what I want to know is why this question got 3 downvotes?
daver
I commented on the question. I didn't downvoted this, but I think it got downvoted because of the unclear exposition. Try to provide an alternative working way to solve the problem or at least to explain the algorithm you'd like to be used in the details.
Andrea Ambu
Andrea, I have updated again with even more information, I'm afraid I don't think it makes it any clearer, but its there if you are interested. And yeah, I got downvotes after I clarified things so I'm thinking theres a bit of people confusing not understanding the problem with my explanation of it going on.
daver
A: 

Numpy must be checking if the target array is the same as the input array when doing the setkey call. Luckily, there are ways around it. First, I tried using numpy.put instead

In [46]: a = numpy.array([1,0,0,0])

In [47]: numpy.put(a,[1,2,3],a[0:3])

In [48]: a
Out[48]: array([1, 1, 1, 1])

And then from the documentation of that, I gave using flatiters a try (a.flat)

In [49]: a = numpy.array([1,0,0,0])

In [50]: a.flat[1:] = a[0:3]

In [51]: a
Out[51]: array([1, 1, 1, 1])

But this doesn't solve the problem you had in mind

In [55]: a = np.array([1,0,0,0])

In [56]: a.flat[1:] = 2*a[0:3]

In [57]: a
Out[57]: array([1, 2, 0, 0])

This fails because the multiplication is done before the assignment, not in parallel as you would like.

Numpy is designed for repeated application of the exact same operation in parallel across an array. To do something more complicated, unless you can find decompose it in terms of functions like numpy.cumsum and numpy.cumprod, you'll have to resort to something like scipy.weave or writing the function in C. (See the PerfomancePython page for more details.) (Also, I've never used weave, so I can't guarantee it will do what you want.)

AFoglia
+1  A: 

accumulate is designed to do what you seem to want; that is, to proprigate an operation along an array. Here's an example:

from numpy import *

a = array([1,0,0,0])
a[1:] = add.accumulate(a[0:3])
# a = [1, 1, 1, 1]

b = array([1,1,1,1])
b[1:] = multiply.accumulate(2*b[0:3])
# b = [1 2 4 8]

Another way to do this is to explicitly specify the result array as the input array. Here's an example:

c = array([2,0,0,0])
multiply(c[:3], c[:3], c[1:])
# c = [  2   4  16 256]
tom10
A: 

i would suggest cython instead of looping in c. there might be some fancy numpy way of getting your example to work using a lot of intermediate steps... but since you know how to write it in c already, just write that quick little bit as a cython function and let cython's magic make the rest of the work easy for you.

Autoplectic
A: 

You could have a look at np.lib.stride_tricks.

There is some information in these excellent slides: http://mentat.za.net/numpy/numpy_advanced_slides/

with stride_tricks starting at slide 29.

I'm not completely clear on the question though so can't suggest anything more concrete - although I would probably do it in cython or fortran with f2py or with weave. I'm liking fortran more at the moment because by the time you add all the required type annotations in cython I think it ends up looking less clear than the fortran.

There is a comparison of these approaches here:

www. scipy. org/ PerformancePython

(can't post more links as I'm a new user) with an example that looks similar to your case.

thrope
+1  A: 

Late answer, but this turned up on Google so I probably point to the doc the OP wanted. Your problem is clear: when using NumPy slices, temporaries are created. Wrap your code in a quick call to weave.blitz to get rid of the temporaries and have the behaviour your want.

Read PerformancePython tutorial for full details.

Nicholas Wilson