views:

131

answers:

3

Hi; I'm sure you've all heard of the "Word game", where you try to change one word to another by changing one letter at a time, and only going through valid English words. I'm trying to implement an A* Algorithm to solve it (just to flesh out my understanding of A*) and one of the things that is needed is a minimum-distance heuristic.

That is, the minimum number of one of these three mutations that can turn an arbitrary string a into another string b: 1) Change one letter for another 2) Add one letter at a spot before or after any letter 3) Remove any letter

Examples

aabca => abaca:
aabca
abca
abaca
= 2

abcdebf => bgabf:
abcdebf
bcdebf
bcdbf
bgdbf
bgabf
= 4

I've tried many algorithms out; I can't seem to find one that gives the actual answer every time. In fact, sometimes I'm not sure if even my human reasoning is finding the best answer.

Does anyone know any algorithm for such purpose? Or maybe can help me find one?

(Just to clarify, I'm asking for an algorithm that can turn any arbitrary string to any other, disregarding their English validity-ness.)

+1  A: 

If you have a reasonably sized (small) dictionary, a breadth first tree search might work.

So start with all words your word can mutate into, then all those can mutate into (except the original), then go down to the third level... Until you find the word you are looking for.

You could eliminate divergent words (ones further away from the target), but doing so might cause you to fail in a case where you must go through some divergent state to reach the shortest path.

Bill K
Well, I have my search algorithm implemented (A*), which accounts for divergent words pretty well (the same way that it can find the best path around a mountain by moving away from the mountain first and going around, instead of always picking the closest point); it has a neat priority system, but all of it relies on a reliable Minimum Distance heuristic; in pathfinding, that's a straight line, ignoring all obstacles. This would be the linguistic equivalent.
Justin L.
So then I don't know of any way except for trying every path and finding the shortest. Given the two words and taking your first step (including your algorithm) how many words would you expect to have to check branching off the first word? If it's just 10 or so you could probably just do a breadth-first search of the entire tree. If it's much more you might have to do a depth-first until you hit a depth of 3 or so then do a breadth-first of that node just to stay within memory constraints. With chess programs I think they do this but are good at throwing away bad paths.
Bill K
+3  A: 

You want the minimum edit distance (or Levenshtein distance):

The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.

And one algorithm to determine the editing sequence is on the same page here.

MSN
that may not apply since he is using english-only words.
Bill K
actually, this is exactly what I'm looking for; I'm looking for a shortest-distance heuristic that doesn't bother with the dictionary. Thanks =)
Justin L.
Bear in mind that if you're trying to find the shortest path via valid words, the levenstein distance only provides a lower bound. The option that has the lowest levenstein distance could actually be further from the destination than one with a higher distance.
Nick Johnson
I'm trying to implement an A* pathfinding algorithm to find the shortest path; the implementation requires a lower-bound heuristic to assist in calculations.
Justin L.
+1  A: 

An excellent reference on "Edit distance" is section 6.3 of the Algorithms textbook by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani, a draft of which is available freely here.

Dijkstra
thanks for the link to the text book; it will come quite in handy =)
Justin L.