tags:

views:

363

answers:

3

Is there an efficient algorithm to find a doublet/word ladder? It can be done brute force, but there has to be a better way to do it. How?

http://en.wikipedia.org/wiki/Word_Ladder

+3  A: 

If you think of this as a path finding problem you could try an A* algorithm.

(A heuristic based search of the answer space.)

Also, do you just want to find a solution or the best solution?

EDIT

I don't feel like changing this, but I see that my example is bad since a single step solves it. Ignore that issue when looking at the example.

Quick overview of how A* works (and slightly applied to this problem)

To use A* you need a function that evaluates a given state (of completion). You want a higher value for states which are closer to the goal.

For this problem two example functions

  • (every letter that is correct * 1) - (number of letters different from goal * 10)
  • (every letter that is correct * 100) - (number of letters different from goal)

As you can see the first favors word size being close over correct letters The 2nd favors letters correct.

Not sure which is better -- tho you could do a balanced formula too.

Lets say we are trying to get from bot -> boat

You would then evaluate all possible changes of boy, lets use the first function so two examples you would evaluate would be boot and bat (and others.) boot has a value of 3 and bat has a value of -7. Boot is better (according to this function) so then we would evaluate all possible changes of Boot (before any of the others) and find the solution.

Clear as mud? Maybe Wikipedia explains it better.

http://en.wikipedia.org/wiki/A*_search_algorithm

Side Notes:

  • A* will find the best solution if the function is designed right, if such a function exists for a give problem. This is the neat feature of A*.

  • An A* enhancement is to short circuit looking at states (for example in the case above -- positive 3 is a very good score (4 is the max score) so your algorithm could stop looking at other states and move on to the one that is very close.

  • The two hard parts of A* are 1) finding the right function and 2) being able to enumerate all the possible states. I think 2 is not so hard to do with a good dictionary file and some fast hashing/searching functions.

Hogan
yeah, but to find the path, I need to build the graph first, right? or am I wrong? First step is to build the graph (lets say I have a 10000 word list). Second step would be to find the shortest path. There are algorithms for the second step, but how do I do the first step?
@unknown: It is usual to build the graph as you go. Any tutorial on A* search should explain things further.
j_random_hacker
A* is an ok way to go here as long as you can find a good heuristic for it. Finding a heuristic for this problem which is <i>always</i> good is going to take some research.
NickLarsen
@unknown I explained more, but no you don't need to build the graph, you just need to evaluate all possible "moves" from one state to another -- so you do need a good algorithm to find all "transitions" from a given state. Any solution would need this I believe.
Hogan
The graph is implicit, the edges can be calculated from each node, so you don't need to create the full graph right off the bat. It's like going from NY to Cali - you don't need the roads in Florida.
Timmy
+1  A: 

According to your wikipedia page, there are these rules:

  1. Add a letter
  2. Remove a letter
  3. Change a letter
  4. Use the same letters in different order (an anagram)

It might help to break it into those 4 subproblems.

For anagrams, there is a very simple algorithm. Create a hash table where each each word is stored a long with its letters sorted alphabetically. So for example, if you have the word races it would turn into acers and then match the anagram for cares which is also acers. These tend to work quite fast.

As for adding a letter and removing a letter, its basically the same thing as the anagrams, only you create the sorted letter list and then branch for each letter you can add or each letter you can remove, until you find one.

If you stick along the same patch, changing a letter appears to be the most difficult simply because there are so many branches off of it.

NickLarsen
I don't see how sorting/hashing the letters could be useful here -- it destroys structure (letter order) that we want to preserve, so after sorting letters and making a change we wind up having to search back through all words whose letters, when sorted, are the same as the new sorted letters.
j_random_hacker
Apologies, I didn't realise anagramming was allowed (despite you clearly writing it!) In that case I can see the advantage. +1.
j_random_hacker
A: 

Levenshtein Distance seems like a good place to start. The steps allowed in calculating the Levenshtein Distance are very close to those allowed in word ladders, with the exception of anagramming. You could probably come up with a good heuristic to use with A* based on L.D.

Peter Recore