You could split the problem up into two pieces:
- Some kind of search algorithm that will enumerate possible strings in the grid.
- A way of testing whether a string is a valid word.
Ideally, (2) should also include a way of testing whether a string is a prefix of a valid word – this will allow you to prune your search and save a whole heap of time.
Adam Rosenfield's Trie is a solution to (2). It's elegant and probably what your algorithms specialist would prefer, but with modern languages and modern computers, we can be a bit lazier. Also, as Kent suggests, we can reduce our dictionary size by discarding words that have letters not present in the grid. Here's some python:
def make_lookups(grid, fn='dict.txt'):
# Make set of valid characters.
chars = set()
for word in grid:
chars.update(word)
words = set(x.strip() for x in open(fn) if set(x.strip()) <= chars)
prefixes = set()
for w in words:
for i in range(len(w)+1):
prefixes.add(w[:i])
return words, prefixes
Wow; constant-time prefix testing. It takes a couple of seconds to load the dictionary you linked, but only a couple :-) (notice that words <= prefixes
)
Now, for part (1), I'm inclined to think in terms of graphs. So I'll build a dictionary that looks something like this:
graph = { (x, y):set([(x0,y0), (x1,y1), (x2,y2)]), }
i.e. graph[(x, y)]
is the set of coordinates that you can reach from position (x, y)
. I'll also add a dummy node None
which will connect to everything.
Building it's a bit clumsy, because there's 8 possible positions and you have to do bounds checking. Here's some correspondingly-clumsy python code:
def make_graph(grid):
root = None
graph = { root:set() }
chardict = { root:'' }
for i, row in enumerate(grid):
for j, char in enumerate(row):
chardict[(i, j)] = char
node = (i, j)
children = set()
graph[node] = children
graph[root].add(node)
add_children(node, children, grid)
return graph, chardict
def add_children(node, children, grid):
x0, y0 = node
for i in [-1,0,1]:
x = x0 + i
if not (0 <= x < len(grid)):
continue
for j in [-1,0,1]:
y = y0 + j
if not (0 <= y < len(grid[0])) or (i == j == 0):
continue
children.add((x,y))
This code also builds up a dictionary mapping (x,y)
to the corresponding character. This lets me turn a list of positions into a word:
def to_word(chardict, pos_list):
return ''.join(chardict[x] for x in pos_list)
Finally, we do a depth-first search. The basic procedure is:
- The search arrives at a particular node.
- Check if the path so far could be part of a word. If not, don't explore this branch any further.
- Check if the path so far is a word. If so, add to the list of results.
- Explore all children not part of the path so far.
Python:
def find_words(graph, chardict, position, prefix, results, words, prefixes):
""" Arguments:
graph :: mapping (x,y) to set of reachable positions
chardict :: mapping (x,y) to character
position :: current position (x,y) -- equals prefix[-1]
prefix :: list of positions in current string
results :: set of words found
words :: set of valid words in the dictionary
prefixes :: set of valid words or prefixes thereof
"""
word = to_word(chardict, prefix)
if word not in prefixes:
return
if word in words:
results.add(word)
for child in graph[position]:
if child not in prefix:
find_words(graph, chardict, child, prefix+[child], results, words, prefixes)
Run the code as:
grid = ['fxie', 'amlo', 'ewbx', 'astu']
g, c = make_graph(grid)
w, p = make_lookups(grid)
res = set()
find_words(g, c, None, [], res, w, p)
and inspect res
to see the answers. Here's a list of words found for your example, sorted by size:
['a', 'b', 'e', 'f', 'i', 'l', 'm', 'o', 's', 't',
'u', 'w', 'x', 'ae', 'am', 'as', 'aw', 'ax', 'bo',
'bu', 'ea', 'el', 'em', 'es', 'fa', 'ie', 'io', 'li',
'lo', 'ma', 'me', 'mi', 'oe', 'ox', 'sa', 'se', 'st',
'tu', 'ut', 'wa', 'we', 'xi', 'aes', 'ame', 'ami',
'ase', 'ast', 'awa', 'awe', 'awl', 'blo', 'but', 'elb',
'elm', 'fae', 'fam', 'lei', 'lie', 'lim', 'lob', 'lox',
'mae', 'maw', 'mew', 'mil', 'mix', 'oil', 'olm', 'saw',
'sea', 'sew', 'swa', 'tub', 'tux', 'twa', 'wae', 'was',
'wax', 'wem', 'ambo', 'amil', 'amli', 'asem', 'axil',
'axle', 'bleo', 'boil', 'bole', 'east', 'fame', 'limb',
'lime', 'mesa', 'mewl', 'mile', 'milo', 'oime', 'sawt',
'seam', 'seax', 'semi', 'stub', 'swam', 'twae', 'twas',
'wame', 'wase', 'wast', 'weam', 'west', 'amble', 'awest',
'axile', 'embox', 'limbo', 'limes', 'swami', 'embole',
'famble', 'semble', 'wamble']
The code takes (literally) a couple of seconds to load the dictionary, but the rest is instant on my machine.