ansaurus

Question

Answer 1

+2 A:

The trick with this sort of thing is to find a way to divide and conquer.

Currently, you're sliding to every position and check every point at every position -- effectively an O( n ^ 2 ) operation.

You need to reduce the check of every point and the comparison of every position to something that does less work to determine a non-match.

For example, you could have a shorter "is this even close?" filter that checks the first few positions. If the correlation is above some threshold, then keep going otherwise give up and move on.

You could have a "check every 8th position" that you multiply by 8. If this is too low, skip it and move on. If this is high enough, then check all of the values to see if you've found the maxima.

The issue is the time required to do all these multiplies -- (f[<unsigned int>(i+j)] * g[j]) In effect, you're filling a big matrix with all these products and picking the row with the maximum sum. You don't want to compute "all" the products. Just enough of the products to be sure you've found the maximum sum.

The issue with finding maxima is that you have to sum everything to see if it's biggest. If you can turn this into a minimization problem, it's easier to abandon computing products and sums once an intermediate result exceeds a threshold.

(I think this might work. I have't tried it.)

If you used max(g)-g[j] to work with negative numbers, you'd be looking for the smallest, not the biggest. You could compute the correlation for the first position. Anything that summed to a bigger value could be stopped immediately -- no more multiplies or adds for that offset, shift to another.

S.Lott 2009-07-29 12:58:22

Thanks for your answer. I found out that numpy.correlate() improves the performance at least 10x. Unfortunately, then I'm unable to use that smallest one trick.

jushie 2009-08-05 08:08:17

Answer 2

+2 A:

you can extract range(size2) from the external loop
you can use sum() instead of a loop to compute current_correlation
you can store correlations and delays in a list and then use max() to get the biggest one

dugres 2009-07-29 13:04:13

Thanks for the help. First I used sum() but numpy.correlate() is faster for that. Now I also store the values and use max() as you told.

jushie 2009-08-05 08:09:54

Answer 3

+14 A:

Using FFTs and the convolution theorem will give you dramatic speed gains by converting the problem from O(n^2) to O(n log n). This is particularly useful for long data sets, like yours, and can give speed gains of 1000s or much more, depending on length. It's also easy to do: just FFT both signals, multiply, and inverse FFT the product. numpy.correlate doesn't use the FFT method in the cross-correlation routine.

Here's an example

from timeit import Timer
from numpy import *

times = arange(0, 100, .001)

xdata = 1.*sin(2*pi*1.*times) + .5*sin(2*pi*1.1*times + 1.)
ydata = .5*sin(2*pi*1.1*times)

def xcorr(x, y):
    return correlate(x, y, mode='same')

def fftxcorr(x, y):
    fx, fy = fft.fft(x), fft.fft(y[::-1])
    fxfy = fx*fy
    xy = fft.ifft(fxfy)
    return xy

if __name__ == "__main__":
    N = 10
    t = Timer("xcorr(xdata, ydata)", "from __main__ import xcorr, xdata, ydata")
    print 'xcorr', t.timeit(number=N)/N
    t = Timer("fftxcorr(xdata, ydata)", "from __main__ import fftxcorr, xdata, ydata")
    print 'fftxcorr', t.timeit(number=N)/N

Which gives the running times per cycle (in seconds, for a 10,000 long waveform)

xcorr 34.3761689901
fftxcorr 0.0768054962158

It's clear the fftxcorr method is much faster.

If you plot out the results, you'll see that they are very similar near zero time shift. Note, though, as you get further away the xcorr will decrease and the fftxcorr won't. This is because it's a bit ambiguous what to do with the parts of the waveform that don't overlap when the waveforms are shifted. xcorr treats it as zero and the FFT treats the waveforms as periodic, but if it's an issue it can be fixed by zero padding.

tom10 2009-08-04 17:40:00

Are your times for 10,000 or for arange(0, 100, .001) ?

Denis 2009-11-24 11:48:05

ansaurus

tags:

views:

answers:

Cython and numpy speed

related questions