I'm interested in a function of two word lists, which would return an order agnostic edit distance between them.
That is, the arguments would be two lists of (let's say space delimited) words and return value would be the minimum sum of the edit (or Levenshtein) distances of the words in the lists.
Distance between "cat rat bat"
and "rat bat cat"
would be 0. Distance between "cat rat bat"
and "fat had bad"
would be the same as distance between "rat bat cat"
and "had fat bad"
, 4. In the case the number of words in the lists are not the same, the shorter list would be padded with 0-length words.
My intuition (which hasn't been nurtured with computer science classes) does not find any other solution than to use brute force:
|had|fat|bad| a solution
---+---+---+---+ +---+---+---+
cat| 2 | 1 | 2 | | | 1 | |
---+---+---+---+ +---+---+---+
rat| 2 | 1 | 2 | | 3 | | |
---+---+---+---+ +---+---+---+
bat| 2 | 1 | 1 | | | | 4 |
---+---+---+---+ +---+---+---+
Starting from the first row, pick a column and go to the next rows without ever revisiting a column you have already visited. Do this over and over again until you've tried all combinations.
To me this sounds a bit like the traveling salesman problem. Is it, and how would you solve my particular problem?