Is there an efficient algorithm to find a doublet/word ladder? It can be done brute force, but there has to be a better way to do it. How?
If you think of this as a path finding problem you could try an A* algorithm.
(A heuristic based search of the answer space.)
Also, do you just want to find a solution or the best solution?
EDIT
I don't feel like changing this, but I see that my example is bad since a single step solves it. Ignore that issue when looking at the example.
Quick overview of how A* works (and slightly applied to this problem)
To use A* you need a function that evaluates a given state (of completion). You want a higher value for states which are closer to the goal.
For this problem two example functions
- (every letter that is correct * 1) - (number of letters different from goal * 10)
- (every letter that is correct * 100) - (number of letters different from goal)
As you can see the first favors word size being close over correct letters The 2nd favors letters correct.
Not sure which is better -- tho you could do a balanced formula too.
Lets say we are trying to get from bot -> boat
You would then evaluate all possible changes of boy, lets use the first function so two examples you would evaluate would be boot and bat (and others.) boot has a value of 3 and bat has a value of -7. Boot is better (according to this function) so then we would evaluate all possible changes of Boot (before any of the others) and find the solution.
Clear as mud? Maybe Wikipedia explains it better.
http://en.wikipedia.org/wiki/A*_search_algorithm
Side Notes:
A* will find the best solution if the function is designed right, if such a function exists for a give problem. This is the neat feature of A*.
An A* enhancement is to short circuit looking at states (for example in the case above -- positive 3 is a very good score (4 is the max score) so your algorithm could stop looking at other states and move on to the one that is very close.
The two hard parts of A* are 1) finding the right function and 2) being able to enumerate all the possible states. I think 2 is not so hard to do with a good dictionary file and some fast hashing/searching functions.
According to your wikipedia page, there are these rules:
- Add a letter
- Remove a letter
- Change a letter
- Use the same letters in different order (an anagram)
It might help to break it into those 4 subproblems.
For anagrams, there is a very simple algorithm. Create a hash table where each each word is stored a long with its letters sorted alphabetically. So for example, if you have the word races
it would turn into acers
and then match the anagram for cares
which is also acers
. These tend to work quite fast.
As for adding a letter and removing a letter, its basically the same thing as the anagrams, only you create the sorted letter list and then branch for each letter you can add or each letter you can remove, until you find one.
If you stick along the same patch, changing a letter appears to be the most difficult simply because there are so many branches off of it.
Levenshtein Distance seems like a good place to start. The steps allowed in calculating the Levenshtein Distance are very close to those allowed in word ladders, with the exception of anagramming. You could probably come up with a good heuristic to use with A* based on L.D.