views:

1047

answers:

3

Hi

I'm using cython for a correlation calculation in my python program. I have two audio data sets and I need to know the time difference between them. The second set is cut based on onset times and then slid across the first set. There are two for-loops: one slides the set and the inner loop calculates correlation at that point. This method works very well and it's accurate enough.

The problem is that with pure python this takes more than one minute. With my cython code, it takes about 17 seconds. This still is too much. Do you have any hints how to speed-up this code:

import numpy as np
cimport numpy as np

cimport cython

FTYPE = np.float
ctypedef np.float_t FTYPE_t

@cython.boundscheck(False)
def delay(np.ndarray[FTYPE_t, ndim=1] f, np.ndarray[FTYPE_t, ndim=1] g):
    cdef int size1 = f.shape[0]
    cdef int size2 = g.shape[0]
    cdef int max_correlation = 0
    cdef int delay = 0
    cdef int current_correlation, i, j

    # Move second data set frame by frame
    for i in range(0, size1 - size2):
        current_correlation = 0

        # Calculate correlation at that point
        for j in range(size2):
            current_correlation += f[<unsigned int>(i+j)] * g[j]

        # Check if current correlation is highest so far
        if current_correlation > max_correlation:
            max_correlation = current_correlation
            delay = i

    return delay
+2  A: 

The trick with this sort of thing is to find a way to divide and conquer.

Currently, you're sliding to every position and check every point at every position -- effectively an O( n ^ 2 ) operation.

You need to reduce the check of every point and the comparison of every position to something that does less work to determine a non-match.

For example, you could have a shorter "is this even close?" filter that checks the first few positions. If the correlation is above some threshold, then keep going otherwise give up and move on.

You could have a "check every 8th position" that you multiply by 8. If this is too low, skip it and move on. If this is high enough, then check all of the values to see if you've found the maxima.

The issue is the time required to do all these multiplies -- (f[<unsigned int>(i+j)] * g[j]) In effect, you're filling a big matrix with all these products and picking the row with the maximum sum. You don't want to compute "all" the products. Just enough of the products to be sure you've found the maximum sum.

The issue with finding maxima is that you have to sum everything to see if it's biggest. If you can turn this into a minimization problem, it's easier to abandon computing products and sums once an intermediate result exceeds a threshold.

(I think this might work. I have't tried it.)

If you used max(g)-g[j] to work with negative numbers, you'd be looking for the smallest, not the biggest. You could compute the correlation for the first position. Anything that summed to a bigger value could be stopped immediately -- no more multiplies or adds for that offset, shift to another.

S.Lott
Thanks for your answer. I found out that numpy.correlate() improves the performance at least 10x. Unfortunately, then I'm unable to use that smallest one trick.
jushie
+2  A: 
  • you can extract range(size2) from the external loop
  • you can use sum() instead of a loop to compute current_correlation
  • you can store correlations and delays in a list and then use max() to get the biggest one
dugres
Thanks for the help. First I used sum() but numpy.correlate() is faster for that. Now I also store the values and use max() as you told.
jushie
+14  A: 

Using FFTs and the convolution theorem will give you dramatic speed gains by converting the problem from O(n^2) to O(n log n). This is particularly useful for long data sets, like yours, and can give speed gains of 1000s or much more, depending on length. It's also easy to do: just FFT both signals, multiply, and inverse FFT the product. numpy.correlate doesn't use the FFT method in the cross-correlation routine.

Here's an example

from timeit import Timer
from numpy import *

times = arange(0, 100, .001)

xdata = 1.*sin(2*pi*1.*times) + .5*sin(2*pi*1.1*times + 1.)
ydata = .5*sin(2*pi*1.1*times)

def xcorr(x, y):
    return correlate(x, y, mode='same')

def fftxcorr(x, y):
    fx, fy = fft.fft(x), fft.fft(y[::-1])
    fxfy = fx*fy
    xy = fft.ifft(fxfy)
    return xy

if __name__ == "__main__":
    N = 10
    t = Timer("xcorr(xdata, ydata)", "from __main__ import xcorr, xdata, ydata")
    print 'xcorr', t.timeit(number=N)/N
    t = Timer("fftxcorr(xdata, ydata)", "from __main__ import fftxcorr, xdata, ydata")
    print 'fftxcorr', t.timeit(number=N)/N

Which gives the running times per cycle (in seconds, for a 10,000 long waveform)

xcorr 34.3761689901
fftxcorr 0.0768054962158

It's clear the fftxcorr method is much faster.

If you plot out the results, you'll see that they are very similar near zero time shift. Note, though, as you get further away the xcorr will decrease and the fftxcorr won't. This is because it's a bit ambiguous what to do with the parts of the waveform that don't overlap when the waveforms are shifted. xcorr treats it as zero and the FFT treats the waveforms as periodic, but if it's an issue it can be fixed by zero padding.

tom10
Are your times for 10,000 or for arange(0, 100, .001) ?
Denis