views:

287

answers:

1

I'd like to use Dynamic Time Warping to compare two feature vectors for two audio recordings (of course I'm doing all the necessary preprocessing first). My program should output the similarity between the two audio recordings in percent. For example 100% means that the two recordings are completely identical, and the more different are the recordings, the lower number I get. How do I get around to it? The DTW only gives me the length of the path or the cost of the transition and I don't know how to convert one of these numbers to a percent value.

+1  A: 

I'm not aware of any distance metric between signals that is measured by percent. If there is a meaning of 100%, then there must be a meaning of 0%. So first you need to ask yourself: what does 0% mean?

For DTW, I'm pretty sure that there is no established conversion of minimum distance to "percent match". If you must, then you need to define a heuristic quantity that is a function of the minimum DTW distance.

EDIT: Actually, you could sort of define a longest distance if you have two finite-length recordings. That would be the distance of a path that went (if looking at the cost matrix) all the way right then down, or all the way down then right. The best path, i.e. perfect match, goes down the main diagonal.

One simple idea: if using (0,1) (1,0) (1,1) as step candidates, you could maybe use the number of steps taken by (0,1) and (1,0) as a measure of badness. This measure certainly has a maximum and a minimum, so then it could be mapped to some desirable range like 0-100%.

Steve
0% could mean that the recordings are infinitely different (infinite length of the DTW path). So of course in practice I would never get 0%. But let me rephrase the question - what do I need to do to make the lengths of the DTW paths directly comparable to each other? Because of course the larger the length of the recordings I compare, the longer path I will get.
pako
So I cannot use the length of the DTW path directly to provide a percent-based grade to the user. I need some way to normalize the length of the resulting path first. Any ideas?
pako
Thanks for rephrasing. See edit.
Steve