views:

403

answers:

3

I've been trying to work on modifying a Levenshtein Distance function so that it can find the distance between two lines, or sets of x-y coordinates (in other words, how similar or different the lines are, not their geometric distance). I'm running into some problems though. I get how you take the value above to get deletion cost, and the one to the left to get addition, but during substitution I'm trying to use euchlidian distance, and it's not working for me.

If you could point out what I'm doing wrong, that would be awesome.

Here's the relevant code in javascript:

padlock.dtw = {
    _deletionCost: 1,
    _insertionCost: 1,
    levenshtein: function(a,b){
        var l1 = a.length, l2 = b.length;
        if (Math.min(l1, l2) === 0) {
            return Math.max(l1, l2);
        }
        var i = 0, j = 0, d = [];
        for (i = 0 ; i <= l1 ; i++) {
            d[i] = [];
            d[i][0] = i;
        }
        for (j = 0 ; j <= l2 ; j++) {
            d[0][j] = j;
        }
        for (i = 1 ; i <= l1 ; i++) {
            for (j = 1 ; j <= l2 ; j++) {
                d[i][j] = Math.min(
                    d[i - 1][j] + this._deletionCost, /* deletion */
                    d[i][j - 1] + this._insertionCost, /* addition */
                    d[i - 1][j - 1] + (a[i - 1] === b[j - 1] ? 0 : this.euclideanDistance(a[i-1], b[j-1])) /* substitution, use euchlidean distance as cost */
                );
            }
        }
        this._debugPrintMatrix(d);
        return d[l1][l2];
    },
    euclideanDistance: function(a, b){
        var xd = a[0]-b[0];
        var yd = a[1]-b[1];
        return Math.abs(Math.sqrt(Math.pow(xd, 2) + Math.pow(yd, 2)));
    },
    _debugPrintMatrix: function(m){
        for(var i=0;i<m.length;i++){
            console.log.apply(this, m[i]);
        }
    }
}

Sample output:

>>> padlock.dtw.levenshtein( [ [1,1], [0,9], [3,3], [4,4] ], [ [1,1], [2,2], [3,3], [4,4] ] )

Distance Matrix:
0 1 2                 3 4
1 0 1                 2 3
2 1 2                 3 4
3 2 2.414213562373095 2 3
4 3 3.414213562373095 3 2

Final Distance: 2
A: 

Wouldn't it be more clever to use geometrics for calculating the distance between two lines? Or is there a specific reason you wouldn't want to use that.

Since two lines always have a point of intersection, unless they are parallel (edit, thanks), it's easy to calculate the smallest distance: that's 0 or insert some math, which can be found on google!

Pindatjuh
you mean unless they are parallel
Anurag
When I say 'distance', I mean more among the lines of how similar or different the two lines are.
Psychcf
Note the asker is talking about two "sets of x-y coordinates" not just two x-y coordinates. You can't draw one line between two sets of points in any exact way.
ironfroggy
The question does say the "distance between two lines, or sets of x-y coordinates" and these statements make no sense together.
Anurag
I edited the question for added clarity, sorry about that
Psychcf
A: 

I don't understand why you would use Levenshtein for this, it appears that you would get much better results from simple calculations.

  • To find the difference in angle of the lines, you could simply find the angle for each line (arctan((x_1-x_2)/(y_1-y_2))) and subtract them.
  • To find the average distance of the lines, you could simply use the distance formula with the first point of each line and the second point of each line and average those distances together.

Other than that (unless your lines are in 3D), there's nothing else to really "compare" them with.

Perhaps I've misunderstood. Are you looking to compare the string values for the lines?

mattbasta
+1  A: 

If I understood your question correctly, then you should completely remove the code for computing euclidian distance between two points!

First, let me restate your question:

You have two sets of points, e.g.

A = [ [1,1], [0,9], [3,3], [4,4] ]
B = [ [1,1], [2,2], [3,3], [4,4] ]

You try to compute a levenshtein distance between those two sets. You substitute "letters" with "points".

Up to this point, it makes sense. Just replace the "letters" in levenshtein algorithm with points and you are done!

But you made a mistake: The original Levenshtein algorithm does not calculate distances between two letters, like e.g. distance(a,b)=1 or distance(a,d)=3.

You tried to extend the algorithm with such a thing (using euclideanDistance() function). But levenshtein algorithm is not meant for such things. And if you have a close look at it, you will see, that it will not work (the values in the matrix have a meaning, and each loop iteration uses values in the matrix that were computed in a previous iteration).

Levenshtein distance is an edit distance, no geometrical distance. You tried to change it, so that it computes a mix of edit and geometrical distance. This mix makes no sense, it is useless and wrong, IMHO.

Conclusion

To compute the levenshtein distance of two sets of x-y coordinates, you should replace your euclidianDistance() with a simple equality comparison (a[0]==b[0] && a[1]==b[1]).

Then the levenshtein algorithm will give you an "edit distance".

frunsi