views:

201

answers:

2

I'm trying to use fancy indexing instead of looping to speed up a function in Numpy. To the best of my knowledge, I've implemented the fancy indexing version correctly. The problem is that the two functions (loop and fancy-indexed) do not return the same result. I'm not sure why. It's worth pointing out that the functions do return the same result if a smaller array is used (e.g., 20 x 20 x 20).

Below I've included everything necessary to reproduce the error. If the functions do return the same result, then the line find_maxdiff(data) - find_maxdiff_fancy(data) should return an array full of zeroes.

from numpy import *

def rms(data, axis=0):
    return sqrt(mean(data ** 2, axis))

def find_maxdiff(data):
    samples, channels, epochs = shape(data)
    window_size = 50
    maxdiff = zeros(epochs)
    for epoch in xrange(epochs):
        signal = rms(data[:, :, epoch], axis=1)
        for t in xrange(window_size, alen(signal) - window_size):
            amp_a = mean(signal[t-window_size:t], axis=0)
            amp_b = mean(signal[t:t+window_size], axis=0)
            the_diff = abs(amp_b - amp_a)
            if the_diff > maxdiff[epoch]: 
                maxdiff[epoch] = the_diff

    return maxdiff

def find_maxdiff_fancy(data):
    samples, channels, epochs = shape(data)
    window_size = 50
    maxdiff = zeros(epochs)
    signal = rms(data, axis=1)
    for t in xrange(window_size, alen(signal) - window_size):
        amp_a = mean(signal[t-window_size:t], axis=0)
        amp_b = mean(signal[t:t+window_size], axis=0)
        the_diff = abs(amp_b - amp_a)
        maxdiff[the_diff > maxdiff] = the_diff

    return maxdiff

data = random.random((600, 20, 100))
find_maxdiff(data) - find_maxdiff_fancy(data)

data = random.random((20, 20, 20))
find_maxdiff(data) - find_maxdiff_fancy(data)
A: 

First, in fancy your signal is now 2D if I understand correctly - so I think it would be clearer to index it explicitly (eg amp_a = mean(signal[t-window_size:t,:], axis=0). Similarly with alen(signal) - this should just be samples in both cases so I think it would be clearer to use that.

It is wrong whenever you are actually doing something in the t loop - when samples < window_lenght as in the 20x20x20 example, that loop never gets executed. As soon as that loop is executed more than once (ie samples > 2 *window_length+1) then the errors come. Not sure why though - they do look equivalent to me.

thrope
+3  A: 

The problem is this line:

maxdiff[the_diff > maxdiff] = the_diff

The left side selects only some elements of maxdiff, but the right side contains all elements of the_diff. This should work instead:

replaceElements = the_diff > maxdiff
maxdiff[replaceElements] = the_diff[replaceElements]

or simply:

maxdiff = maximum(maxdiff, the_diff)

As for why 20x20x20 size seems to work: This is because your window size is too large, so nothing gets executed.

interjay
Thanks for helping me better understand how fancy assignment works. Also, I should have caught the silly reason why the smaller array was working :) Thanks again.
pealco