views:

389

answers:

6

Hi,

I was wondering if someone could point me to an algorithm/technique that is used to compare time dependent signals. Ideally, this hypothetical algorithm would take in 2 signals as inputs and return a number that would be the percentage similarity between the signals (0 being that the 2 signals are statistically unrelated and 1 being that they are a perfect match).

Of course, I realize that there are problems with my request, namely that I'm not sure how to properly define 'similarity' in the context of comparing these 2 signals, so if someone could also point me in the right direction (as to what I should look up/know, etc.), I'd appreciate it as well.

Thanks.

A: 

I don't know signal processing, so this is a guess ..:

Is your signal effectively a list of ordered pairs (x,y), where x is the time and y the amplitude? If so, then perhaps you could throw away then time coordinate -- e.g.:

Signal 1: [(x0,y0), (x1,y1), (x2,y2), (x3,y3), ...]
Signal 2: [(x0,z0), (x1,z1), (x2,z1), (x3,z3), ...]

Throw away time:

Signal 1: [y0, y1, y2, y3, ...]
Signal 2: [z0, z1, z2, z3, ...]

Then you can compare the amplitudes with each other, perhaps by looking for a correlation. Perhaps you could plot y against z:

Comparing: [(y0,z0), (y1,z1), (y2,z2), (y3,z3), ...]

Or calculate one of the various correlation coefficients.

John Fouhy
A: 

General solution: you can bin the data into histograms and use a Chi-squared test or a Kolomogrov test.

Both are explicitly intended to estimate the chances that the two distribution represent random samples from the same underling distribution (that is: have the same shape to within statistics).

I don't know a c implementation off the top of my head, but ROOT provides c++ implementation of both:

I believe the docs point to some papers as well.


I think that CERNLIB provides both algorithms in fortran77, which you can link to c. Translating the ROOT code might be easier.

dmckee
+1  A: 

You could try a Fast Fourier Transform (look up FFT in Wikipedia, there are open source libraries for performing conversions).

FFTs will transform your data from time domain (i.e. a pulse at 1s, 2s, 3s, 4s...) to data in frequency domain (i.e. a pulse each second).

Then you can compare frequencies and their relative strenghts more easily. It should be a step in the right direction for you.

Kristoffon
Not a bad suggestion for some kinds of input, but you actually leave the OP with the same problem: how two compare to spectra.
dmckee
dmckee: except that the fourier transformed signals would be comparable point-by-point without shifts. By the way, "spectra" is the wrong term for the original signals, since that would imply energy (or equivalent measure) on the x axis.
Svante
+5  A: 

The cross-correlation function is the classic signal processing solution. If you have access to Matlab, see the XCORR function. max(abs(xcorr(Signal1, Signal2, 'coeff'))) would give you specifically what you're looking for.

Cross-correlation assumes that the "similarity" you're looking for is a measure of the linear relationship between the two signals. The definition for real-valued finite-length signals with time index n = 0..N-1 is:

C[g] = sum{m = 0..N-1} (x1[m] * x2[g+m])

g runs from -N..N (outside that range the product inside the sum is 0).

Although you asked for a number, the function is pretty interesting. The function domain g is called the lag domain. If x1 and x2 are related by a time shift, the cross-correlation function will have its peak at the lag corresponding to the shift. For instance, if you had x1 = sin[wn] and x2 = sin[wn + phi], so two sine waves at the same frequency and different phase, the cross-correlation function would have its peak at the lag corresponding to the phase shift. If x2 is a scaled version of x1, the cross-correlation will scale also. You can normalize the function to a correlation coefficient by dividing by sqrt(sum(x1^2)*sum(x2^2)), and bring it into 0..1 by taking an absolute value (that line of Matlab has these operations).

EDIT: Add information on wider usefulness of cross-correlation
I apologize for concentrating on only one aspect of the cross-correlation function in my answer. Here is a better summary if what it is good/bad for.

Cross-correlation works well for determining if one signal is linearly related to another, that is if
x2(t) = sum{n = 0..K-1}(A_n * x1(t + phi_n))
where x1(t) and x2(t) are the signals in question, A_n are scaling factors, and phi_n are time shifts. The implications of this are:
1. If one signal is a time shifted version of the other (phi_n <> 0 for some n) the cross-correlation function will be non-zero.
2. If one signal is a scaled version of the other (A_n <> 0 for some n) the cross-correlation function will be non-zero.
3. If one signal is a combination of scaled and time shifted versions of the other (both A_n and phi_n are non-zero for some number of n's) the cross-correlation function will be non-zero. Note that this is also a definition of a linear filter.

To get more concrete, suppose x1 is a wideband random signal. Let x2=x1. Now the normalized cross-correlation function will be exactly 1 at g=0, and near 0 everywhere else. Now let x2 be a (linearly) filtered version of x1. The cross-correlation function will be non-zero near g=0. The width of the non-zero part will depend on the bandwidth of the filter.

For the special case of x1 and x2 being periodic, the information on the phase-shift in the original part of the answer applies.

Where cross-correlation will not help is if the two signals are not linearly related. For instance, two periodic signals at different frequencies are not linearly related. Nor are two random signals drawn from a wideband random process at different times. Nor are two signals that are similar in shape but with different time indexing - this is like the unequal fundamental frequency case.

In all cases, normalizing the cross-correlation function and looking at the maximum value will tell you if the signals are potentially linearly related - if the number is low, like under 0.1, I would be comfortable declaring them unrelated. Higher than that and I'd look into it more carefully, graphing both the normalized and unnormalized cross-correlation functions and looking at the structure. A periodic cross-correlation implies both signals are periodic, and a cross-correlation function that is noticeably higher around g=0 implies one signal is a filtered version of the other.

mtrw
It is not entirely clear from the question, but if oort wants a measure of sameness-of-shape (rather than in-phaseness), then cross-correlation is less than optimal: given one base signal s_b and two test signals t_1, t_2, it is not well suited to telling which test signal is most like the base signal.
dmckee
I'm looking for a function that would give a measure of sameness-of-shape. What would be better suited?
oort
@oort: look at my answer for a couple of choices. These are more complicated that cross-correlation, but they really do go straight to the matter of shape similarity.
dmckee
A: 

Dynamic Time Warping is an approach you can use if the signals should be matched by speeding up and slowing down time at different positions.

Jouni K. Seppänen
A: 

You don't say very much about what the signals are, and what measure of "sameness" would be meaningful to you. But, if the signals are in-phase (that is, you want to compare the two signals instant-by-instant, and there's not going to be any time-delay to consider) then I'd suggest you look at Pearson's correlator. It gives you a value of 1 if the two signals are identical, a value of 0 if they're entirely dissimilar, and something in between if they kinda rhyme. As an added advantage, Pearson's doesn't care if the signals are amplified differently, (except if one signal is the inverse of the other it gives you a result of -1).

Does that sound like what you're looking for?

http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

Jules