ansaurus

Question

How to find list of possible words from a letter matrix [Boggle Solver]

Answer 1

+3 A:

First, read how one of the C# language designers solved a related problem: http://blogs.msdn.com/ericlippert/archive/2009/02/04/a-nasality-talisman-for-the-sultana-analyst.aspx.

Like him, you can start with a dictionary and the canonacalize words by creating a dictionary from an array of letters sorted alphabetically to a list of words that can be spelled from those letters.

Next, start creating the possible words from the board and looking them up. I suspect that will get you pretty far, but there are certainly more tricks that might speed things up.

RossFabricant 2009-04-14 02:15:05

Answer 2

+2 A:

Does your search algorithm continually decrease the word list as your search continues?

For instance, in the search above there are only 13 letters that your words can start with (effectively reducing to half as many starting letters).

As you add more letter permutations it would further decrease the available word sets decreasing the searching necessary.

I'd start there.

jerebear 2009-04-14 02:16:28

Answer 3

+44 A:

The fastest solution you're going to get will probably involve storing your dictionary in a trie. Then, create a queue of triplets (x, y, s), where each element in the queue corresponds to a prefix s of a word which can be spelled in the grid, ending at location (x, y). Initialize the queue with N x N elements (where N is the size of your grid), one element for each square in the grid. Then, the algorithm proceeds as follows:

While the queue is not empty:
  Dequeue a triple (x, y, s)
  For each square (x', y') with letter c adjacent to (x, y):
    If s+c is a word, output s+c
    If s+c is a prefix of a word, insert (x', y', s+c) into the queue

If you store your dictionary in a trie, testing if s+c is a word or a prefix of a word can be done in constant time (provided you also keep some extra metadata in each queue datum, such as a pointer to the current node in the trie), so the running time of this algorithm is O(number of words that can be spelled).

[Edit] Here's an implementation in Python that I just coded up:

#!/usr/bin/python

class TrieNode:
    def __init__(self, parent, value):
        self.parent = parent
        self.children = [None] * 26
        self.isWord = False
        if parent is not None:
            parent.children[ord(value) - 97] = self

def MakeTrie(dictfile):
    dict = open(dictfile)
    root = TrieNode(None, '')
    for word in dict:
        curNode = root
        for letter in word.lower():
            if 97 <= ord(letter) < 123:
                nextNode = curNode.children[ord(letter) - 97]
                if nextNode is None:
                    nextNode = TrieNode(curNode, letter)
                curNode = nextNode
        curNode.isWord = True
    return root

def BoggleWords(grid, dict):
    rows = len(grid)
    cols = len(grid[0])
    queue = []
    words = []
    for y in range(cols):
        for x in range(rows):
            c = grid[y][x]
            node = dict.children[ord(c) - 97]
            if node is not None:
                queue.append((x, y, c, node))
    while queue:
        x, y, s, node = queue[0]
        del queue[0]
        for dx, dy in ((1, 0), (1, -1), (0, -1), (-1, -1), (-1, 0), (-1, 1), (0, 1), (1, 1)):
            x2, y2 = x + dx, y + dy
            if 0 <= x2 < cols and 0 <= y2 < rows:
                s2 = s + grid[y2][x2]
                node2 = node.children[ord(grid[y2][x2]) - 97]
                if node2 is not None:
                    if node2.isWord:
                        words.append(s2)
                    queue.append((x2, y2, s2, node2))

    return words

Example usage:

d = MakeTrie('/usr/share/dict/words')
print(BoggleWords(['fxie','amlo','ewbx','astu'], d))

Output:

['fa', 'xi', 'ie', 'io', 'el', 'am', 'ax', 'ae', 'aw', 'mi', 'ma', 'me', 'lo', 'li', 'oe', 'ox', 'em', 'ea', 'ea', 'es', 'wa', 'we', 'wa', 'bo', 'bu', 'as', 'aw', 'ae', 'st', 'se', 'sa', 'tu', 'ut', 'fam', 'fae', 'imi', 'eli', 'elm', 'elb', 'ami', 'ama', 'ame', 'aes', 'awl', 'awa', 'awe', 'awa', 'mix', 'mim', 'mil', 'mam', 'max', 'mae', 'maw', 'mew', 'mem', 'mes', 'lob', 'lox', 'lei', 'leo', 'lie', 'lim', 'oil', 'olm', 'ewe', 'eme', 'wax', 'waf', 'wae', 'waw', 'wem', 'wea', 'wea', 'was', 'waw', 'wae', 'bob', 'blo', 'bub', 'but', 'ast', 'ase', 'asa', 'awl', 'awa', 'awe', 'awa', 'aes', 'swa', 'swa', 'sew', 'sea', 'sea', 'saw', 'tux', 'tub', 'tut', 'twa', 'twa', 'tst', 'utu', 'fama', 'fame', 'ixil', 'imam', 'amli', 'amil', 'ambo', 'axil', 'axle', 'mimi', 'mima', 'mime', 'milo', 'mile', 'mewl', 'mese', 'mesa', 'lolo', 'lobo', 'lima', 'lime', 'limb', 'lile', 'oime', 'oleo', 'olio', 'oboe', 'obol', 'emim', 'emil', 'east', 'ease', 'wame', 'wawa', 'wawa', 'weam', 'west', 'wese', 'wast', 'wase', 'wawa', 'wawa', 'boil', 'bolo', 'bole', 'bobo', 'blob', 'bleo', 'bubo', 'asem', 'stub', 'stut', 'swam', 'semi', 'seme', 'seam', 'seax', 'sasa', 'sawt', 'tutu', 'tuts', 'twae', 'twas', 'twae', 'ilima', 'amble', 'axile', 'awest', 'mamie', 'mambo', 'maxim', 'mease', 'mesem', 'limax', 'limes', 'limbo', 'limbu', 'obole', 'emesa', 'embox', 'awest', 'swami', 'famble', 'mimble', 'maxima', 'embolo', 'embole', 'wamble', 'semese', 'semble', 'sawbwa', 'sawbwa']

Notes: This program doesn't output 1-letter words, or filter by word length at all. That's easy to add but not really relevant to the problem. It also outputs some words multiple times if they can be spelled in multiple ways. If a given word can be spelled in many different ways (worst case: every letter in the grid is the same (e.g. 'A') and a word like 'aaaaaaaaaa' is in your dictionary), then the running time will get horribly exponential. Filtering out duplicates and sorting is trivial to due after the algorithm has finished.

Adam Rosenfield 2009-04-14 02:19:18

Nice. I also implemented a Boggle solver using tries.

FogleBird 2009-04-14 02:58:41

Ooo. I'm glad someone stepped up to the plate. Although this works, it doesn't "remember" the letter it has already used, and comes up with words that would require using the same letter twice which is not allowed. As I'm an idiot, how would I go about fixing that?

Paolo Bergantino 2009-04-14 03:02:53

True, it doesn't remember what letters have been visited, but that wasn't specified in your spec =). To fix that, you'd have to add to each queue datum a list of all the visited locations, and then check that list before adding the next character.

Adam Rosenfield 2009-04-14 03:04:54

Where would that be done? Inside MakeTrie?

Paolo Bergantino 2009-04-14 03:13:13

No, inside BoggleWords(). Instead of storing a quadruplet (x, y, s, n), you'd store a quintuplet (x, y, s, n, l), where l is the list of (x, y)'s visited so far. Then you check each (x2, y2) against l and accept it only if it's not in l. Then you add it to the new l.

Adam Rosenfield 2009-04-14 03:15:31

Its *reasonably* simple if you're already doing path tracking, just look at the current path for a future node before attempting to revisit again on the same path. Its a suboptimal, but for what were doing its below IO threshold.

Kent Fredric 2009-04-14 15:36:10

By the way, you should only need to generate all of the permutations of legal sequences for an NxN grid once for each value of N. That list isn't dependent on the letters in the grid.

Robert Rossney 2009-04-14 19:15:38

I did this too when I got sick of playing Scramble. I think the recursive (DFS instead of BFS) solution is more sexy, as you can just keep a set of active cells (so you don't visit the same cell twice). Much neater then keeping a bunch of lists.

Justin Scheiner 2009-06-30 04:59:28

Answer 4

+3 A:

I'd have to give more thought to a complete solution, but as a handy optimisation, I wonder whether it might be worth pre-computing a table of frequencies of digrams and trigrams (2- and 3-letter combinations) based on all the words from your dictionary, and use this to prioritise your search. I'd go with the starting letters of words. So if your dictionary contained the words "India", "Water", "Extreme", and "Extraordinary", then your pre-computed table might be:

'IN': 1
'WA': 1
'EX': 2

Then search for these digrams in the order of commonality (first EX, then WA/IN)

Smashery 2009-04-14 02:20:32

Answer 5

+3 A:

I suggest making a tree of letters based on words. The tree would be composed of a letter structs, like this:

letter: char
isWord: boolean

Then you build up the tree, with each depth adding a new letter. In other words, on the first level there'd be the alphabet; then from each of those trees, there'd be another another 26 entries, and so on, until you've spelled out all the words. Hang onto this parsed tree, and it'll make all possible answers faster to look up.

With this parsed tree, you can very quickly find solutions. Here's the pseudo-code:

BEGIN: 
    For each letter:
        if the struct representing it on the current depth has isWord == true, enter it as an answer.
        Cycle through all its neighbors; if there is a child of the current node corresponding to the letter, recursively call BEGIN on it.

This could be sped up with a bit of dynamic programming. For example, in your sample, the two 'A's are both next to an 'E' and a 'W', which (from the point they hit them on) would be identical. I don't have enough time to really spell out the code for this, but I think you can gather the idea.

Also, I'm sure you'll find other solutions if you Google for "Boggle solver".

Daniel Lew 2009-04-14 02:25:19

You just defined a "Trie" http://en.wikipedia.org/wiki/Trie

Kent Fredric 2009-04-14 13:16:41

Answer 6

+2 A:

Hilarious. I nearly posted the same question a few days ago due to the same damn game! I did not however because just searched google for boggle solver python and got all the answers I could want.

vgm64 2009-04-14 03:53:51

I wasn't aware the popular name of it was "boggle", but I did find some stuff on google, I was just curious to see what people would come up with on SO. :)

Paolo Bergantino 2009-04-14 04:37:11

Answer 7

+9 A:

You could split the problem up into two pieces:

Some kind of search algorithm that will enumerate possible strings in the grid.
A way of testing whether a string is a valid word.

Ideally, (2) should also include a way of testing whether a string is a prefix of a valid word – this will allow you to prune your search and save a whole heap of time.

Adam Rosenfield's Trie is a solution to (2). It's elegant and probably what your algorithms specialist would prefer, but with modern languages and modern computers, we can be a bit lazier. Also, as Kent suggests, we can reduce our dictionary size by discarding words that have letters not present in the grid. Here's some python:

def make_lookups(grid, fn='dict.txt'):
    # Make set of valid characters.
    chars = set()
    for word in grid:
        chars.update(word)

    words = set(x.strip() for x in open(fn) if set(x.strip()) <= chars)
    prefixes = set()
    for w in words:
        for i in range(len(w)+1):
            prefixes.add(w[:i])

    return words, prefixes

Wow; constant-time prefix testing. It takes a couple of seconds to load the dictionary you linked, but only a couple :-) (notice that words <= prefixes)

Now, for part (1), I'm inclined to think in terms of graphs. So I'll build a dictionary that looks something like this:

graph = { (x, y):set([(x0,y0), (x1,y1), (x2,y2)]), }

i.e. graph[(x, y)] is the set of coordinates that you can reach from position (x, y). I'll also add a dummy node None which will connect to everything.

Building it's a bit clumsy, because there's 8 possible positions and you have to do bounds checking. Here's some correspondingly-clumsy python code:

def make_graph(grid):
    root = None
    graph = { root:set() }
    chardict = { root:'' }

    for i, row in enumerate(grid):
        for j, char in enumerate(row):
            chardict[(i, j)] = char
            node = (i, j)
            children = set()
            graph[node] = children
            graph[root].add(node)
            add_children(node, children, grid)

    return graph, chardict

def add_children(node, children, grid):
    x0, y0 = node
    for i in [-1,0,1]:
        x = x0 + i
        if not (0 <= x < len(grid)):
            continue
        for j in [-1,0,1]:
            y = y0 + j
            if not (0 <= y < len(grid[0])) or (i == j == 0):
                continue

            children.add((x,y))

This code also builds up a dictionary mapping (x,y) to the corresponding character. This lets me turn a list of positions into a word:

def to_word(chardict, pos_list):
    return ''.join(chardict[x] for x in pos_list)

Finally, we do a depth-first search. The basic procedure is:

The search arrives at a particular node.
Check if the path so far could be part of a word. If not, don't explore this branch any further.
Check if the path so far is a word. If so, add to the list of results.
Explore all children not part of the path so far.

Python:

def find_words(graph, chardict, position, prefix, results, words, prefixes):
    """ Arguments:
      graph :: mapping (x,y) to set of reachable positions
      chardict :: mapping (x,y) to character
      position :: current position (x,y) -- equals prefix[-1]
      prefix :: list of positions in current string
      results :: set of words found
      words :: set of valid words in the dictionary
      prefixes :: set of valid words or prefixes thereof
    """
    word = to_word(chardict, prefix)

    if word not in prefixes:
        return

    if word in words:
        results.add(word)

    for child in graph[position]:
        if child not in prefix:
            find_words(graph, chardict, child, prefix+[child], results, words, prefixes)

Run the code as:

grid = ['fxie', 'amlo', 'ewbx', 'astu']
g, c = make_graph(grid)
w, p = make_lookups(grid)
res = set()
find_words(g, c, None, [], res, w, p)

and inspect res to see the answers. Here's a list of words found for your example, sorted by size:

 ['a', 'b', 'e', 'f', 'i', 'l', 'm', 'o', 's', 't',
 'u', 'w', 'x', 'ae', 'am', 'as', 'aw', 'ax', 'bo',
 'bu', 'ea', 'el', 'em', 'es', 'fa', 'ie', 'io', 'li',
 'lo', 'ma', 'me', 'mi', 'oe', 'ox', 'sa', 'se', 'st',
 'tu', 'ut', 'wa', 'we', 'xi', 'aes', 'ame', 'ami',
 'ase', 'ast', 'awa', 'awe', 'awl', 'blo', 'but', 'elb',
 'elm', 'fae', 'fam', 'lei', 'lie', 'lim', 'lob', 'lox',
 'mae', 'maw', 'mew', 'mil', 'mix', 'oil', 'olm', 'saw',
 'sea', 'sew', 'swa', 'tub', 'tux', 'twa', 'wae', 'was',
 'wax', 'wem', 'ambo', 'amil', 'amli', 'asem', 'axil',
 'axle', 'bleo', 'boil', 'bole', 'east', 'fame', 'limb',
 'lime', 'mesa', 'mewl', 'mile', 'milo', 'oime', 'sawt',
 'seam', 'seax', 'semi', 'stub', 'swam', 'twae', 'twas',
 'wame', 'wase', 'wast', 'weam', 'west', 'amble', 'awest',
 'axile', 'embox', 'limbo', 'limes', 'swami', 'embole',
 'famble', 'semble', 'wamble']

The code takes (literally) a couple of seconds to load the dictionary, but the rest is instant on my machine.

John Fouhy 2009-04-14 04:27:46

Very nice! Very fast, too. I'm going to wait around to see if anyone else steps up to the plate, but your answer is looking good so far.

Paolo Bergantino 2009-04-14 04:39:58

I'm confused why "embole" is your only 6-letter word, I got 10 different words for that. It appears you prohibit visiting the same node twice, and as the OP stated, that's fair game.

Kent Fredric 2009-04-14 13:34:37

No, it actually isn't, I need to update my question.

Paolo Bergantino 2009-04-14 14:13:43

ok, hes still possibly got a bug as hes discarding "FAMBLE" "WAMBLE" and "SEMBLE", which don't share characters.

Kent Fredric 2009-04-14 15:31:16

Well spotted! The bug was in the creation of the prefixes set: I needed to use `range(len(w)+1)` instead of `range(len(w))`. I claimed that `words <= prefixes` but apparently I didn't test that :-/

John Fouhy 2009-04-14 22:00:30

Kent's optimization results in a speedup from (on my system) 2.5s to 0.8s. I can post the modification if you like.

John Fouhy 2009-04-14 22:18:09

Please post the modified code, John.

Paolo Bergantino 2009-07-02 19:46:07

Answer 8

+18 A:

For a dictionary speedup, there is one general transformation/process you can do to greatly reduce the dictionary comparisons ahead of time.

Given that the above grid contains only 16 characters, some of them duplicate, you can greatly reduce the number of total keys in your dictionary by simply filtering out entries that have unattainable characters.

I thought this was the obvious optimization but seeing nobody did it I'm mentioning it.

It reduced me from a dictionary of 200,000 keys to only 2,000 keys simply during the input pass. This at the very least reduces memory overhead, and that's sure to map to a speed increase somewhere as memory isn't infinitely fast.

Perl Implementation

My implementation is a bit top-heavy because I placed importance on being able to know the exact path of every extracted string, not just the validity therein.

I also have a few adaptions in there that would theoretically permit a grid with holes in it to function, and grids with different sized lines ( assuming you get the input right and it lines up somehow ).

The early-filter is by far the most significant bottleneck in my application, as suspected earlier, commenting out that line bloats it from 1.5s to 7.5s.

Upon execution it appears to think all the single digits are on their own valid words, but I'm pretty sure thats due to how the dictionary file works.

Its a bit bloated, but at least I reuse Tree::Trie from cpan

Some of it was inspired partially by the existing implementations, some of it I had in mind already.

Constructive Criticism and ways it could be improved welcome ( /me notes he never searched CPAN for a boggle solver, but this was more fun to work out )

updated for new criteria

#!/usr/bin/perl 

use strict;
use warnings;

{

  # this package manages a given path through the grid.
  # Its an array of matrix-nodes in-order with
  # Convenience functions for pretty-printing the paths
  # and for extending paths as new paths.

  # Usage:
  # my $p = Prefix->new(path=>[ $startnode ]);
  # my $c = $p->child( $extensionNode );
  # print $c->current_word ;

  package Prefix;
  use Moose;

  has path => (
      isa     => 'ArrayRef[MatrixNode]',
      is      => 'rw',
      default => sub { [] },
  );
  has current_word => (
      isa        => 'Str',
      is         => 'rw',
      lazy_build => 1,
  );

  # Create a clone of this object
  # with a longer path

  # $o->child( $successive-node-on-graph );

  sub child {
      my $self    = shift;
      my $newNode = shift;
      my $f       = Prefix->new();

      # Have to do this manually or other recorded paths get modified
      push @{ $f->{path} }, @{ $self->{path} }, $newNode;
      return $f;
  }

  # Traverses $o->path left-to-right to get the string it represents.

  sub _build_current_word {
      my $self = shift;
      return join q{}, map { $_->{value} } @{ $self->{path} };
  }

  # Returns  the rightmost node on this path

  sub tail {
      my $self = shift;
      return $self->{path}->[-1];
  }

  # pretty-format $o->path

  sub pp_path {
      my $self = shift;
      my @path =
        map { '[' . $_->{x_position} . ',' . $_->{y_position} . ']' }
        @{ $self->{path} };
      return "[" . join( ",", @path ) . "]";
  }

  # pretty-format $o
  sub pp {
      my $self = shift;
      return $self->current_word . ' => ' . $self->pp_path;
  }

  __PACKAGE__->meta->make_immutable;
}

{

  # Basic package for tracking node data
  # without having to look on the grid.
  # I could have just used an array or a hash, but that got ugly.

# Once the matrix is up and running it doesn't really care so much about rows/columns,
# Its just a sea of points and each point has adjacent points.
# Relative positioning is only really useful to map it back to userspace

  package MatrixNode;
  use Moose;

  has x_position => ( isa => 'Int', is => 'rw', required => 1 );
  has y_position => ( isa => 'Int', is => 'rw', required => 1 );
  has value      => ( isa => 'Str', is => 'rw', required => 1 );
  has siblings   => (
      isa     => 'ArrayRef[MatrixNode]',
      is      => 'rw',
      default => sub { [] }
  );

# Its not implicitly uni-directional joins. It would be more effient in therory
# to make the link go both ways at the same time, but thats too hard to program around.
# and besides, this isn't slow enough to bother caring about.

  sub add_sibling {
      my $self    = shift;
      my $sibling = shift;
      push @{ $self->siblings }, $sibling;
  }

  # Convenience method to derive a path starting at this node

  sub to_path {
      my $self = shift;
      return Prefix->new( path => [$self] );
  }
  __PACKAGE__->meta->make_immutable;

}

{

  package Matrix;
  use Moose;

  has rows => (
      isa     => 'ArrayRef',
      is      => 'rw',
      default => sub { [] },
  );

  has regex => (
      isa        => 'Regexp',
      is         => 'rw',
      lazy_build => 1,
  );

  has cells => (
      isa        => 'ArrayRef',
      is         => 'rw',
      lazy_build => 1,
  );

  sub add_row {
      my $self = shift;
      push @{ $self->rows }, [@_];
  }

  # Most of these functions from here down are just builder functions,
  # or utilities to help build things.
  # Some just broken out to make it easier for me to process.
  # All thats really useful is add_row
  # The rest will generally be computed, stored, and ready to go
  # from ->cells by the time either ->cells or ->regex are called.

  # traverse all cells and make a regex that covers them.
  sub _build_regex {
      my $self  = shift;
      my $chars = q{};
      for my $cell ( @{ $self->cells } ) {
          $chars .= $cell->value();
      }
      $chars = "[^$chars]";
      return qr/$chars/i;
  }

  # convert a plain cell ( ie: [x][y] = 0 )
  # to an intelligent cell ie: [x][y] = object( x, y )
  # we only really keep them in this format temporarily
  # so we can go through and tie in neighbouring information.
  # after the neigbouring is done, the grid should be considered inoperative.

  sub _convert {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      my $v    = $self->_read( $x, $y );
      my $n    = MatrixNode->new(
          x_position => $x,
          y_position => $y,
          value      => $v,
      );
      $self->_write( $x, $y, $n );
      return $n;
  }

# go through the rows/collums presently available and freeze them into objects.

  sub _build_cells {
      my $self = shift;
      my @out  = ();
      my @rows = @{ $self->{rows} };
      for my $x ( 0 .. $#rows ) {
          next unless defined $self->{rows}->[$x];
          my @col = @{ $self->{rows}->[$x] };
          for my $y ( 0 .. $#col ) {
              next unless defined $self->{rows}->[$x]->[$y];
              push @out, $self->_convert( $x, $y );
          }
      }
      for my $c (@out) {
          for my $n ( $self->_neighbours( $c->x_position, $c->y_position ) ) {
              $c->add_sibling( $self->{rows}->[ $n->[0] ]->[ $n->[1] ] );
          }
      }
      return \@out;
  }

  # given x,y , return array of points that refer to valid neighbours.
  sub _neighbours {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      my @out  = ();
      for my $sx ( -1, 0, 1 ) {
          next if $sx + $x < 0;
          next if not defined $self->{rows}->[ $sx + $x ];
          for my $sy ( -1, 0, 1 ) {
              next if $sx == 0 && $sy == 0;
              next if $sy + $y < 0;
              next if not defined $self->{rows}->[ $sx + $x ]->[ $sy + $y ];
              push @out, [ $sx + $x, $sy + $y ];
          }
      }
      return @out;
  }

  sub _has_row {
      my $self = shift;
      my $x    = shift;
      return defined $self->{rows}->[$x];
  }

  sub _has_cell {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      return defined $self->{rows}->[$x]->[$y];
  }

  sub _read {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      return $self->{rows}->[$x]->[$y];
  }

  sub _write {
      my $self = shift;
      my $x    = shift;
      my $y    = shift;
      my $v    = shift;
      $self->{rows}->[$x]->[$y] = $v;
      return $v;
  }

  __PACKAGE__->meta->make_immutable;
}

use Tree::Trie;

sub readDict {
  my $fn = shift;
  my $re = shift;
  my $d  = Tree::Trie->new();

  # Dictionary Loading
  open my $fh, '<', $fn;
  while ( my $line = <$fh> ) {
      chomp($line);

 # Commenting the next line makes it go from 1.5 seconds to 7.5 seconds. EPIC.
      next if $line =~ $re;    # Early Filter
      $d->add( uc($line) );
  }
  return $d;
}

sub traverseGraph {
  my $d     = shift;
  my $m     = shift;
  my $min   = shift;
  my $max   = shift;
  my @words = ();

  # Inject all grid nodes into the processing queue.

  my @queue =
    grep { $d->lookup( $_->current_word ) }
    map  { $_->to_path } @{ $m->cells };

  while (@queue) {
      my $item = shift @queue;

      # put the dictionary into "exact match" mode.

      $d->deepsearch('exact');

      my $cword = $item->current_word;
      my $l     = length($cword);

      if ( $l >= $min && $d->lookup($cword) ) {
          push @words,
            $item;    # push current path into "words" if it exactly matches.
      }
      next if $l > $max;

      # put the dictionary into "is-a-prefix" mode.
      $d->deepsearch('boolean');

    siblingloop: foreach my $sibling ( @{ $item->tail->siblings } ) {
          foreach my $visited ( @{ $item->{path} } ) {
              next siblingloop if $sibling == $visited;
          }

          # given path y , iterate for all its end points
          my $subpath = $item->child($sibling);

          # create a new path for each end-point
          if ( $d->lookup( $subpath->current_word ) ) {

             # if the new path is a prefix, add it to the bottom of the queue.
              push @queue, $subpath;
          }
      }
  }
  return \@words;
}

sub setup_predetermined { 
  my $m = shift; 
  my $gameNo = shift;
  if( $gameNo == 0 ){
      $m->add_row(qw( F X I E ));
      $m->add_row(qw( A M L O ));
      $m->add_row(qw( E W B X ));
      $m->add_row(qw( A S T U ));
      return $m;
  }
  if( $gameNo == 1 ){
      $m->add_row(qw( D G H I ));
      $m->add_row(qw( K L P S ));
      $m->add_row(qw( Y E U T ));
      $m->add_row(qw( E O R N ));
      return $m;
  }
}
sub setup_random { 
  my $m = shift; 
  my $seed = shift;
  srand $seed;
  my @letters = 'A' .. 'Z' ; 
  for( 1 .. 4 ){ 
      my @r = ();
      for( 1 .. 4 ){
          push @r , $letters[int(rand(25))];
      }
      $m->add_row( @r );
  }
}

# Here is where the real work starts.

my $m = Matrix->new();
setup_predetermined( $m, 0 );
#setup_random( $m, 5 );

my $d = readDict( 'dict.txt', $m->regex );
my $c = scalar @{ $m->cells };    # get the max, as per spec

print join ",\n", map { $_->pp } @{
  traverseGraph( $d, $m, 3, $c ) ;
};

Arch/execution info for comparison:

model name      : Intel(R) Core(TM)2 Duo CPU     T9300  @ 2.50GHz
cache size      : 6144 KB
Memory usage summary: heap total: 77057577, heap peak: 11446200, stack peak: 26448
       total calls   total memory   failed calls
 malloc|     947212       68763684              0
realloc|      11191        1045641              0  (nomove:9063, dec:4731, free:0)
 calloc|     121001        7248252              0
   free|     973159       65854762

Histogram for block sizes:
  0-15         392633  36% ==================================================
 16-31          43530   4% =====
 32-47          50048   4% ======
 48-63          70701   6% =========
 64-79          18831   1% ==
 80-95          19271   1% ==
 96-111        238398  22% ==============================
112-127          3007  <1% 
128-143        236727  21% ==============================

More Mumblings on that Regex Optimization

The regex optimization I use is useless for multi-solve dictionaries, and for multi-solve you'll want a full dictionary, not a pre-trimmed one.

However, that said, for one-off solves, its really fast. ( Perl regex are in C! :) )

Here is some varying code additions:

sub readDict_nofilter {
  my $fn = shift;
  my $re = shift;
  my $d  = Tree::Trie->new();

  # Dictionary Loading
  open my $fh, '<', $fn;
  while ( my $line = <$fh> ) {
      chomp($line);
      $d->add( uc($line) );
  }
  return $d;
}

sub benchmark_io { 
  use Benchmark qw( cmpthese :hireswallclock );
   # generate a random 16 character string 
   # to simulate there being an input grid. 
  my $regexen = sub { 
      my @letters = 'A' .. 'Z' ; 
      my @lo = ();
      for( 1..16 ){ 
          push @lo , $_ ; 
      }
      my $c  = join '', @lo;
      $c = "[^$c]";
      return qr/$c/i;
  };
  cmpthese( 200 , { 
      filtered => sub { 
          readDict('dict.txt', $regexen->() );
      }, 
      unfiltered => sub {
          readDict_nofilter('dict.txt');
      }
  });
}

           s/iter unfiltered   filtered
unfiltered   8.16         --       -94%
filtered    0.464      1658%         --

_{ps: 8.16 * 200 = 27 minutes.}

Kent Fredric 2009-04-14 10:02:13

I know I'm failing the optimization club, but I had speed issues before I got to the real work of the code, and reducing input time from 2s to 1.2s means a lot to me.

Kent Fredric 2009-04-14 10:04:48

/me noted it odd now it took *less* time to regex and skip entries than it took to add keys to a hash.

Kent Fredric 2009-04-14 10:05:37

Nice, a Perl implementation! I'll go run it now.

Paolo Bergantino 2009-04-14 13:20:07

Blerg, having a hard time installing Tree::Trie on my webserver. :(

Paolo Bergantino 2009-04-14 13:46:53

Paolo: Simply putting Trie.pm from the CPAN distribution into a "lib/Tree" directory and then adding "use lib 'path/to/that/lib';" to the top of the program should do the trick in this case.

tsee 2009-04-14 14:03:40

Yeah, Tree::Trie fails the POD tests for some reason, its pretty bollocks, so if you're installing w/ cpan you'll want to force it somehow to continue on test failure.

Kent Fredric 2009-04-14 14:08:20

Thanks; I'll try that. By the way, John Fouhy's is correct in that once you use one node in a word, you can't use it again. That was my bad on the OP for not making that clear, I've updated the question.

Paolo Bergantino 2009-04-14 14:14:50

ugh. pesky. just as well I record the path then. I can just eliminate it after-the-fact :p

Kent Fredric 2009-04-14 14:30:01

also, I'm progressively accelerating my code, I just halved its runtime, my biggest bottleneck is IO at present.

Kent Fredric 2009-04-14 14:30:36

Well my dumb webhost is being impossible, I got Trie working but now I'm getting errors with Moose, so just let me know your runtimes and I'll update the post. I'm working on my own PHP implementation right now, and I'll eventually put a bounty on this, just for fun. :)

Paolo Bergantino 2009-04-14 14:34:32

My current bests are around 1.116s, but of that 400ms is reported as just *reading* the dictionary. /tmp/dict.txt is on a tmpfs, so...

Kent Fredric 2009-04-14 14:53:53

Are you using the same dict as the question or a different one?

Paolo Bergantino 2009-04-14 14:59:10

same dictionary as posted in question @ 2.2mb

Kent Fredric 2009-04-14 15:23:23

I'm not sure why you have listed me @ 1millisecond (ms), 1.116 is 1 thousand milliseconds and then some.

Kent Fredric 2009-04-14 15:24:57

I'm not even sure if python can *load* in >100ms

Kent Fredric 2009-04-14 15:34:08

Whooops, I meant seconds, not ms, I fixed that sorry. :)

Paolo Bergantino 2009-04-14 15:48:57

Average over 50 runs with printing: 1.019s Real, 0.970s user, 0.018 sys

Kent Fredric 2009-04-14 16:04:07

without print: 1.001 real, 0.962 user 0.019 sys

Kent Fredric 2009-04-14 16:07:08

How did you generate that last report (arch/execution info)? Looks useful.

jmanning2k 2009-04-14 16:13:19

arch I sucked out of /proc/cpuinfo , the rest is "memusage", which is a utility bundled with glibc.

Kent Fredric 2009-04-14 16:25:20

Answer 9

+23 A:

My answer works like the others here, but I'll post it because it looks a bit faster than the other Python solutions, from setting up the dictionary faster. (I checked this against John Fouhy's solution.) After setup, the time to solve is down in the noise.

import re

grid = "fxie amlo ewbx astu".split()
nrows, ncols = len(grid), len(grid[0])

# A dictionary word that could be a solution must use only
# the grid's letters and have length >= 3. (With a case-
# insensitive match.)
alphabet = ''.join(set(''.join(grid)))
possible_solution = re.compile('[' + alphabet + ']{3,}$', re.I)

words = set(word.lower()
            for word in open('words').read().splitlines()
            if possible_solution.match(word))
prefixes = set(word[:i]
               for word in words
               for i in range(2, len(word)+1))

def neighbors((x, y)):
    for nx in range(max(0, x-1), min(ncols, x+2)):
        for ny in range(max(0, y-1), min(nrows, y+2)):
            yield (nx, ny)

def extending(prefix, path):
    if prefix in words:
        yield (prefix, path)
    for (nx, ny) in neighbors(path[-1]):
        if (nx, ny) not in path:
            prefix1 = prefix + grid[ny][nx]
            if prefix1 in prefixes:
                for result in extending(prefix1, path + ((nx, ny),)):
                    yield result

def solve():
    for y, row in enumerate(grid):
        for x, prefix in enumerate(row):
            for result in extending(prefix, ((x, y),)):
                yield result

# Print a maximal-length word and its path:
print max(solve(), key=lambda (word, path): len(word))

Edit: Filter out words less than 3 letters long.

Edit 2: I was curious why Kent Fredric's Perl solution was faster; it turns out to use regular-expression matching instead of a set of characters. Doing the same in Python about doubles the speed.

Darius Bacon 2009-04-15 02:00:37

The program is only giving me 1 word. How come?

Paolo Bergantino 2009-04-15 02:18:57

I didn't want to drown in output. See the comment at the bottom.

Darius Bacon 2009-04-15 02:38:56

Or get all the words without the paths: print ' '.join(sorted(set(word for (word, path) in solve())))

Darius Bacon 2009-04-15 02:40:10

Answer 10

+6 A:

Not interested in VB? :) I couldn't resist. I've solved this differently than many of the solutions presented here.

My times are:

Loading the dictionary and word prefixes into a hashtable: .5 to 1 seconds.
Finding the words: averaging under 10 milliseconds.

EDIT: Dictionary load times on the web host server are running about 1 to 1.5 seconds longer than my home computer.

I don't know how badly the times will deteriorate with a load on the server.

I wrote my solution as a web page in .Net. myvrad.com/boggle

I'm using the dictionary referenced in the original question.

Letters are not reused in a word. Only words 3 characters or longer are found.

I'm using a hashtable of all unique word prefixes and words instead of a trie. I didn't know about trie's so I learned something there. The idea of creating a list of prefixes of words in addition to the complete words is what finally got my times down to a respectable number.

Read the code comments for additional details.

Here's the code:

Imports System.Collections.Generic
Imports System.IO

Partial Class boggle_Default

    'Bob Archer, 4/15/2009

    'To avoid using a 2 dimensional array in VB I'm not using typical X,Y
    'coordinate iteration to find paths.
    '
    'I have locked the code into a 4 by 4 grid laid out like so:
    ' abcd
    ' efgh
    ' ijkl
    ' mnop
    ' 
    'To find paths the code starts with a letter from a to p then
    'explores the paths available around it. If a neighboring letter
    'already exists in the path then we don't go there.
    '
    'Neighboring letters (grid points) are hard coded into
    'a Generic.Dictionary below.



    'Paths is a list of only valid Paths found. 
    'If a word prefix or word is not found the path is not
    'added and extending that path is terminated.
    Dim Paths As New Generic.List(Of String)

    'NeighborsOf. The keys are the letters a to p.
    'The value is a string of letters representing neighboring letters.
    'The string of neighboring letters is split and iterated later.
    Dim NeigborsOf As New Generic.Dictionary(Of String, String)

    'BoggleLetters. The keys are mapped to the lettered grid of a to p.
    'The values are what the user inputs on the page.
    Dim BoggleLetters As New Generic.Dictionary(Of String, String)

    'Used to store last postition of path. This will be a letter
    'from a to p.
    Dim LastPositionOfPath As String = ""

    'I found a HashTable was by far faster than a Generic.Dictionary 
    ' - about 10 times faster. This stores prefixes of words and words.
    'I determined 792773 was the number of words and unique prefixes that
    'will be generated from the dictionary file. This is a max number and
    'the final hashtable will not have that many.
    Dim HashTableOfPrefixesAndWords As New Hashtable(792773)

    'Stores words that are found.
    Dim FoundWords As New Generic.List(Of String)

    'Just to validate what the user enters in the grid.
    Dim ErrorFoundWithSubmittedLetters As Boolean = False

    Public Sub BuildAndTestPathsAndFindWords(ByVal ThisPath As String)
        'Word is the word correlating to the ThisPath parameter.
        'This path would be a series of letters from a to p.
        Dim Word As String = ""

        'The path is iterated through and a word based on the actual
        'letters in the Boggle grid is assembled.
        For i As Integer = 0 To ThisPath.Length - 1
            Word += Me.BoggleLetters(ThisPath.Substring(i, 1))
        Next

        'If my hashtable of word prefixes and words doesn't contain this Word
        'Then this isn't a word and any further extension of ThisPath will not
        'yield any words either. So exit sub to terminate exploring this path.
        If Not HashTableOfPrefixesAndWords.ContainsKey(Word) Then Exit Sub

        'The value of my hashtable is a boolean representing if the key if a word (true) or
        'just a prefix (false). If true and at least 3 letters long then yay! word found.
        If HashTableOfPrefixesAndWords(Word) AndAlso Word.Length > 2 Then Me.FoundWords.Add(Word)

        'If my List of Paths doesn't contain ThisPath then add it.
        'Remember only valid paths will make it this far. Paths not found
        'in the HashTableOfPrefixesAndWords cause this sub to exit above.
        If Not Paths.Contains(ThisPath) Then Paths.Add(ThisPath)

        'Examine the last letter of ThisPath. We are looking to extend the path
        'to our neighboring letters if any are still available.
        LastPositionOfPath = ThisPath.Substring(ThisPath.Length - 1, 1)

        'Loop through my list of neighboring letters (representing grid points).
        For Each Neighbor As String In Me.NeigborsOf(LastPositionOfPath).ToCharArray()
            'If I find a neighboring grid point that I haven't already used
            'in ThisPath then extend ThisPath and feed the new path into
            'this recursive function. (see recursive.)
            If Not ThisPath.Contains(Neighbor) Then Me.BuildAndTestPathsAndFindWords(ThisPath & Neighbor)
        Next
    End Sub

    Protected Sub ButtonBoggle_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles ButtonBoggle.Click

        'User has entered the 16 letters and clicked the go button.

        'Set up my Generic.Dictionary of grid points, I'm using letters a to p -
        'not an x,y grid system.  The values are neighboring points.
        NeigborsOf.Add("a", "bfe")
        NeigborsOf.Add("b", "cgfea")
        NeigborsOf.Add("c", "dhgfb")
        NeigborsOf.Add("d", "hgc")
        NeigborsOf.Add("e", "abfji")
        NeigborsOf.Add("f", "abcgkjie")
        NeigborsOf.Add("g", "bcdhlkjf")
        NeigborsOf.Add("h", "cdlkg")
        NeigborsOf.Add("i", "efjnm")
        NeigborsOf.Add("j", "efgkonmi")
        NeigborsOf.Add("k", "fghlponj")
        NeigborsOf.Add("l", "ghpok")
        NeigborsOf.Add("m", "ijn")
        NeigborsOf.Add("n", "ijkom")
        NeigborsOf.Add("o", "jklpn")
        NeigborsOf.Add("p", "klo")

        'Retrieve letters the user entered.
        BoggleLetters.Add("a", Me.TextBox1.Text.ToLower.Trim())
        BoggleLetters.Add("b", Me.TextBox2.Text.ToLower.Trim())
        BoggleLetters.Add("c", Me.TextBox3.Text.ToLower.Trim())
        BoggleLetters.Add("d", Me.TextBox4.Text.ToLower.Trim())
        BoggleLetters.Add("e", Me.TextBox5.Text.ToLower.Trim())
        BoggleLetters.Add("f", Me.TextBox6.Text.ToLower.Trim())
        BoggleLetters.Add("g", Me.TextBox7.Text.ToLower.Trim())
        BoggleLetters.Add("h", Me.TextBox8.Text.ToLower.Trim())
        BoggleLetters.Add("i", Me.TextBox9.Text.ToLower.Trim())
        BoggleLetters.Add("j", Me.TextBox10.Text.ToLower.Trim())
        BoggleLetters.Add("k", Me.TextBox11.Text.ToLower.Trim())
        BoggleLetters.Add("l", Me.TextBox12.Text.ToLower.Trim())
        BoggleLetters.Add("m", Me.TextBox13.Text.ToLower.Trim())
        BoggleLetters.Add("n", Me.TextBox14.Text.ToLower.Trim())
        BoggleLetters.Add("o", Me.TextBox15.Text.ToLower.Trim())
        BoggleLetters.Add("p", Me.TextBox16.Text.ToLower.Trim())

        'Validate user entered something with a length of 1 for all 16 textboxes.
        For Each S As String In BoggleLetters.Keys
            If BoggleLetters(S).Length <> 1 Then
                ErrorFoundWithSubmittedLetters = True
                Exit For
            End If
        Next

        'If input is not valid then...
        If ErrorFoundWithSubmittedLetters Then
            'Present error message.
        Else
            'Else assume we have 16 letters to work with and start finding words.
            Dim SB As New StringBuilder

            Dim Time As String = String.Format("{0}:{1}:{2}:{3}", Date.Now.Hour.ToString(), Date.Now.Minute.ToString(), Date.Now.Second.ToString(), Date.Now.Millisecond.ToString())

            Dim NumOfLetters As Integer = 0
            Dim Word As String = ""
            Dim TempWord As String = ""
            Dim Letter As String = ""
            Dim fr As StreamReader = Nothing
            fr = New System.IO.StreamReader(HttpContext.Current.Request.MapPath("~/boggle/dic.txt"))

            'First fill my hashtable with word prefixes and words.
            'HashTable(PrefixOrWordString, BooleanTrueIfWordFalseIfPrefix)
            While fr.Peek <> -1
                Word = fr.ReadLine.Trim()
                TempWord = ""
                For i As Integer = 0 To Word.Length - 1
                    Letter = Word.Substring(i, 1)
                    'This optimization helped quite a bit. Words in the dictionary that begin
                    'with letters that the user did not enter in the grid shouldn't go in my hashtable.
                    '
                    'I realize most of the solutions went with a Trie. I'd never heard of that before,
                    'which is one of the neat things about SO, seeing how others approach challenges
                    'and learning some best practices.
                    '
                    'However, I didn't code a Trie in my solution. I just have a hashtable with 
                    'all words in the dicitonary file and all possible prefixes for those words.
                    'A Trie might be faster but I'm not coding it now. I'm getting good times with this.
                    If i = 0 AndAlso Not BoggleLetters.ContainsValue(Letter) Then Continue While
                    TempWord += Letter
                    If Not HashTableOfPrefixesAndWords.ContainsKey(TempWord) Then
                        HashTableOfPrefixesAndWords.Add(TempWord, TempWord = Word)
                    End If
                Next
            End While

            SB.Append("Number of Word Prefixes and Words in Hashtable: " & HashTableOfPrefixesAndWords.Count.ToString())
            SB.Append("<br />")

            SB.Append("Loading Dictionary: " & Time & " - " & String.Format("{0}:{1}:{2}:{3}", Date.Now.Hour.ToString(), Date.Now.Minute.ToString(), Date.Now.Second.ToString(), Date.Now.Millisecond.ToString()))
            SB.Append("<br />")

            Time = String.Format("{0}:{1}:{2}:{3}", Date.Now.Hour.ToString(), Date.Now.Minute.ToString(), Date.Now.Second.ToString(), Date.Now.Millisecond.ToString())

            'This starts a path at each point on the grid an builds a path until 
            'the string of letters correlating to the path is not found in the hashtable
            'of word prefixes and words.
            Me.BuildAndTestPathsAndFindWords("a")
            Me.BuildAndTestPathsAndFindWords("b")
            Me.BuildAndTestPathsAndFindWords("c")
            Me.BuildAndTestPathsAndFindWords("d")
            Me.BuildAndTestPathsAndFindWords("e")
            Me.BuildAndTestPathsAndFindWords("f")
            Me.BuildAndTestPathsAndFindWords("g")
            Me.BuildAndTestPathsAndFindWords("h")
            Me.BuildAndTestPathsAndFindWords("i")
            Me.BuildAndTestPathsAndFindWords("j")
            Me.BuildAndTestPathsAndFindWords("k")
            Me.BuildAndTestPathsAndFindWords("l")
            Me.BuildAndTestPathsAndFindWords("m")
            Me.BuildAndTestPathsAndFindWords("n")
            Me.BuildAndTestPathsAndFindWords("o")
            Me.BuildAndTestPathsAndFindWords("p")

            SB.Append("Finding Words: " & Time & " - " & String.Format("{0}:{1}:{2}:{3}", Date.Now.Hour.ToString(), Date.Now.Minute.ToString(), Date.Now.Second.ToString(), Date.Now.Millisecond.ToString()))
            SB.Append("<br />")

            SB.Append("Num of words found: " & FoundWords.Count.ToString())
            SB.Append("<br />")
            SB.Append("<br />")

            FoundWords.Sort()
            SB.Append(String.Join("<br />", FoundWords.ToArray()))

            'Output results.
            Me.LiteralBoggleResults.Text = SB.ToString()
            Me.PanelBoggleResults.Visible = True

        End If

    End Sub

End Class

rvarcher 2009-04-15 08:02:46

Oooh, VB. Very nice.

Paolo Bergantino 2009-04-15 12:36:52

I'm going to assume here you used the a-p system instead of [x][y] because the latter is rather complex in VB? I spent a day trying to get a 2-way-dynamic array in that once, ie: array( array( 1, "hello"), 1, "hello" , array() ) , still don't know how to do that :P

Kent Fredric 2009-04-16 12:39:56

In PHP and Perl 2 dim arrays are fun. It can be done in VB but I wouldn't call it a fun process. Dim Arr(, ) As Integer = {{1,1},{0,0}}. The A-P process grew out of me putting myself on the grid and asking, 'where can I go from here?' I know it's a rigid solution but it works here.

rvarcher 2009-04-16 15:00:21

Answer 11

+7 A:

Surprisingly, no one attempted a PHP version of this.

This is a working PHP version of John Fouhy's Python solution.

Although I took some pointers from everyone else's answers, this is mostly copied from John.

$boggle = "fxie
           amlo
           ewbx
           astu";

$alphabet = str_split(str_replace(array("\n", " ", "\r"), "", strtolower($boggle)));
$rows = array_map('trim', explode("\n", $boggle));
$dictionary = file("C:/dict.txt");
$prefixes = array(''=>'');
$words = array();
$regex = '/[' . implode('', $alphabet) . ']{3,}$/S';
foreach($dictionary as $k=>$value) {
    $value = trim(strtolower($value));
    $length = strlen($value);
    if(preg_match($regex, $value)) {
        for($x = 0; $x < $length; $x++) {
            $letter = substr($value, 0, $x+1);
            if($letter == $value) {
                $words[$value] = 1;
            } else {
                $prefixes[$letter] = 1;
            }
        }
    }
}

$graph = array();
$chardict = array();
$positions = array();
$c = count($rows);
for($i = 0; $i < $c; $i++) {
    $l = strlen($rows[$i]);
    for($j = 0; $j < $l; $j++) {
        $chardict[$i.','.$j] = $rows[$i][$j];
        $children = array();
        $pos = array(-1,0,1);
        foreach($pos as $z) {
            $xCoord = $z + $i;
            if($xCoord < 0 || $xCoord >= count($rows)) {
                continue;
            }
            $len = strlen($rows[0]);
            foreach($pos as $w) {
                $yCoord = $j + $w;
                if(($yCoord < 0 || $yCoord >= $len) || ($z == 0 && $w == 0)) {
                    continue;
                }
                $children[] = array($xCoord, $yCoord);
            }
        }
        $graph['None'][] = array($i, $j);
        $graph[$i.','.$j] = $children;
    }
}

function to_word($chardict, $prefix) {
    $word = array();
    foreach($prefix as $v) {
        $word[] = $chardict[$v[0].','.$v[1]];
    }
    return implode("", $word);
}

function find_words($graph, $chardict, $position, $prefix, $prefixes, &$results, $words) {
    $word = to_word($chardict, $prefix);
    if(!isset($prefixes[$word])) return false;

    if(isset($words[$word])) {
        $results[] = $word;
    }

    foreach($graph[$position] as $child) {
        if(!in_array($child, $prefix)) {
            $newprefix = $prefix;
            $newprefix[] = $child;
            find_words($graph, $chardict, $child[0].','.$child[1], $newprefix, $prefixes, $results, $words);
        }
    }
}

$solution = array();
find_words($graph, $chardict, 'None', array(), $prefixes, $solution);
print_r($solution);

Here is a live link if you want to try it out. Although it takes ~2s in my local machine, it takes ~5s on my webserver. In either case, it is not very fast. Still, though, it is quite hideous so I can imagine the time can be reduced significantly. Any pointers on how to accomplish that would be appreciated. PHP's lack of tuples made the coordinates weird to work with and my inability to comprehend just what the hell is going on didn't help at all.

EDIT: A few fixes make it take less than 1s locally.

Paolo Bergantino 2009-04-16 17:20:12

+1 @ "and my inability to comprehend just what the hell is going on didn't help at all." lol. I love honesty!

dna123 2009-04-16 21:45:24

I don't know PHP, but the first thing I'd try is hoisting '/[' . implode('', $alphabet) . ']{3,}$/' out of the loop. That is, set a variable to that and use the variable instead inside the loop.

Darius Bacon 2009-04-17 06:32:14

I'm pretty sure that PHP keeps a per-thread global cache of compiled regular expressions, but I'll try that anyways.

Paolo Bergantino 2009-04-17 14:50:24

I followed the advice in this question (http://stackoverflow.com/questions/209906/compile-regex-in-php) and added the S flag to the preg_match, it reduced the time by ~0.1s, not that much but I'll take it. :)

Paolo Bergantino 2009-04-17 14:57:04

Paolo, maybe I'm wrong about PHP, but I'd assume that, as in other languages, that cache maps from regexes in their string form to regexes in their compiled form. But you have an expression to evaluate just to get to the string form, first. The PHP compiler would have to figure out that the expression is invariant if it's to safely skip reevaluating it.

Darius Bacon 2009-05-14 06:41:25

Not the case with PHP, Darius.

Paolo Bergantino 2009-06-17 19:06:47

nsisoczxyltyaerhFatal error: Allowed memory size of 58720256 bytes exhausted (tried to allocate 48 bytes) in /home/pbysh/www/stackoverflow/boggle.php on line 23

Daniel Moura 2009-07-03 13:37:35

@Daniel: Not coming up for me...

Paolo Bergantino 2009-07-03 14:17:09

@Paolo did you use the letters nsis oczx ylty aerh? Tried again and the error persists.

Daniel Moura 2009-07-03 21:42:39

@Daniel: You have to put newlines every 4 characters. nsis\noczx\nylty\naerh\n

Paolo Bergantino 2009-07-03 21:44:39

@Paolo: I did that. I used other letters and works fine. But those I get this error.

Daniel Moura 2009-07-03 21:55:22

@Daniel: Mmm. It's working for me. That is weird.. hold on.

Paolo Bergantino 2009-07-03 22:00:47

@Daniel: Apparently it's my web server. It doesn't happen when I run locally. Shrug. Don't really feel like hunting it down.

Paolo Bergantino 2009-07-03 22:14:55

Answer 12

+4 A:

As soon as I saw the problem statement, I thought "Trie". But seeing as several other posters made use of that approach, I looked for another approach just to be different. Alas, the Trie approach performs better. I ran Kent's Perl solution on my machine and it took 0.31 seconds to run, after adapting it to use my dictionary file. My own perl implementation required 0.54 seconds to run.

This was my approach:

Create a transition hash to model the legal transitions.
Iterate through all 16^3 possible three letter combinations.
- In the loop, exclude illegal transitions and repeat visits to the same square. Form all the legal 3-letter sequences and store them in a hash.
Then loop through all words in the dictionary.
- Exclude words that are too long or short
- Slide a 3-letter window across each word and see if it is among the 3-letter combos from step 2. Exclude words that fail. This eliminates most non-matches.
- If still not eliminated, use a recursive algorithm to see if the word can be formed by making paths through the puzzle. (This part is slow, but called infrequently.)
Print out the words I found.

I tried 3-letter and 4-letter sequences, but 4-letter sequences slowed the program down.

In my code, I use /usr/share/dict/words for my dictionary. It comes standard on MAC OS X and many Unix systems. You can use another file if you want. To crack a different puzzle, just change the variable @puzzle. This would be easy to adapt for larger matrices. You would just need to change the %transitions hash and %legalTransitions hash.

The strength of this solution is that the code is short, and the data structures simple.

Here is the Perl code (which uses too many global variables, I know):

#!/usr/bin/perl
use Time::HiRes  qw{ time };

sub readFile($);
sub findAllPrefixes($);
sub isWordTraceable($);
sub findWordsInPuzzle(@);

my $startTime = time;

# Puzzle to solve

my @puzzle = ( 
    F, X, I, E,
    A, M, L, O,
    E, W, B, X,
    A, S, T, U
);

my $minimumWordLength = 3;
my $maximumPrefixLength = 3; # I tried four and it slowed down.

# Slurp the word list.
my $wordlistFile = "/usr/share/dict/words";

my @words = split(/\n/, uc(readFile($wordlistFile)));
print "Words loaded from word list: " . scalar @words . "\n";

print "Word file load time: " . (time - $startTime) . "\n";
my $postLoad = time;

# Define the legal transitions from one letter position to another. 
# Positions are numbered 0-15.
#     0  1  2  3
#     4  5  6  7
#     8  9 10 11
#    12 13 14 15
my %transitions = ( 
   -1 => [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
    0 => [1,4,5], 
    1 => [0,2,4,5,6],
    2 => [1,3,5,6,7],
    3 => [2,6,7],
    4 => [0,1,5,8,9],
    5 => [0,1,2,4,6,8,9,10],
    6 => [1,2,3,5,7,9,10,11],
    7 => [2,3,6,10,11],
    8 => [4,5,9,12,13],
    9 => [4,5,6,8,10,12,13,14],
    10 => [5,6,7,9,11,13,14,15],
    11 => [6,7,10,14,15],
    12 => [8,9,13],
    13 => [8,9,10,12,14],
    14 => [9,10,11,13,15],
    15 => [10,11,14]
);

# Convert the transition matrix into a hash for easy access.
my %legalTransitions = ();
foreach my $start (keys %transitions) {
    my $legalRef = $transitions{$start};
    foreach my $stop (@$legalRef) {
     my $index = ($start + 1) * (scalar @puzzle) + ($stop + 1);
     $legalTransitions{$index} = 1;
    }
}

my %prefixesInPuzzle = findAllPrefixes($maximumPrefixLength);

print "Find prefixes time: " . (time - $postLoad) . "\n";
my $postPrefix = time;

my @wordsFoundInPuzzle = findWordsInPuzzle(@words);

print "Find words in puzzle time: " . (time - $postPrefix) . "\n";

print "Unique prefixes found: " . (scalar keys %prefixesInPuzzle) . "\n";
print "Words found (" . (scalar @wordsFoundInPuzzle) . ") :\n    " . join("\n    ", @wordsFoundInPuzzle) . "\n";

print "Total Elapsed time: " . (time - $startTime) . "\n";

###########################################

sub readFile($) {
    my ($filename) = @_;
    my $contents;
    if (-e $filename) {
     # This is magic: it opens and reads a file into a scalar in one line of code. 
     # See http://www.perl.com/pub/a/2003/11/21/slurp.html
     $contents = do { local( @ARGV, $/ ) = $filename ; <> } ; 
    }
    else {
     $contents = '';
    }
    return $contents;
}

# Is it legal to move from the first position to the second? They must be adjacent.
sub isLegalTransition($$) {
    my ($pos1,$pos2) = @_;
    my $index = ($pos1 + 1) * (scalar @puzzle) + ($pos2 + 1);
    return $legalTransitions{$index};
}

# Find all prefixes where $minimumWordLength <= length <= $maxPrefixLength
#
#   $maxPrefixLength ... Maximum length of prefix we will store. Three gives best performance. 
sub findAllPrefixes($) {
    my ($maxPrefixLength) = @_;
    my %prefixes = ();
    my $puzzleSize = scalar @puzzle;

    # Every possible N-letter combination of the letters in the puzzle 
    # can be represented as an integer, though many of those combinations
    # involve illegal transitions, duplicated letters, etc.
    # Iterate through all those possibilities and eliminate the illegal ones.
    my $maxIndex = $puzzleSize ** $maxPrefixLength;

    for (my $i = 0; $i < $maxIndex; $i++) {
     my @path;
     my $remainder = $i;
     my $prevPosition = -1;
     my $prefix = '';
     my %usedPositions = ();
     for (my $prefixLength = 1; $prefixLength <= $maxPrefixLength; $prefixLength++) {
      my $position = $remainder % $puzzleSize;

      # Is this a valid step?
      #  a. Is the transition legal (to an adjacent square)?
      if (! isLegalTransition($prevPosition, $position)) {
       last;
      }

      #  b. Have we repeated a square?
      if ($usedPositions{$position}) {
       last;
      }
      else {
       $usedPositions{$position} = 1;
      }

      # Record this prefix if length >= $minimumWordLength.
      $prefix .= $puzzle[$position];
      if ($prefixLength >= $minimumWordLength) {
       $prefixes{$prefix} = 1;
      }

      push @path, $position;
      $remainder -= $position;
      $remainder /= $puzzleSize;
      $prevPosition = $position;
     } # end inner for
    } # end outer for
    return %prefixes;
}

# Loop through all words in dictionary, looking for ones that are in the puzzle.
sub findWordsInPuzzle(@) {
    my @allWords = @_;
    my @wordsFound = ();
    my $puzzleSize = scalar @puzzle;
WORD: foreach my $word (@allWords) {
     my $wordLength = length($word);
     if ($wordLength > $puzzleSize || $wordLength < $minimumWordLength) {
      # Reject word as too short or too long.
     }
     elsif ($wordLength <= $maximumPrefixLength ) {
      # Word should be in the prefix hash.
      if ($prefixesInPuzzle{$word}) {
       push @wordsFound, $word;
      }
     }
     else {
      # Scan through the word using a window of length $maximumPrefixLength, looking for any strings not in our prefix list.
      # If any are found that are not in the list, this word is not possible.
      # If no non-matches are found, we have more work to do.
      my $limit = $wordLength - $maximumPrefixLength + 1;
      for (my $startIndex = 0; $startIndex < $limit; $startIndex ++) {
       if (! $prefixesInPuzzle{substr($word, $startIndex, $maximumPrefixLength)}) {
        next WORD;
       }
      }
      if (isWordTraceable($word)) {
       # Additional test necessary: see if we can form this word by following legal transitions
       push @wordsFound, $word;
      }
     }

    }
    return @wordsFound;
}

# Is it possible to trace out the word using only legal transitions?
sub isWordTraceable($) {
    my $word = shift;
    return traverse([split(//, $word)], [-1]); # Start at special square -1, which may transition to any square in the puzzle.
}

# Recursively look for a path through the puzzle that matches the word.
sub traverse($$) {
    my ($lettersRef, $pathRef) = @_;
    my $index = scalar @$pathRef - 1;
    my $position = $pathRef->[$index];
    my $letter = $lettersRef->[$index];
    my $branchesRef =  $transitions{$position};
BRANCH: foreach my $branch (@$branchesRef) {
      if ($puzzle[$branch] eq $letter) {
       # Have we used this position yet?
       foreach my $usedBranch (@$pathRef) {
        if ($usedBranch == $branch) {
         next BRANCH;
        }
       }
       if (scalar @$lettersRef == $index + 1) {
        return 1; # End of word and success.
       }
       push @$pathRef, $branch;
       if (traverse($lettersRef, $pathRef)) {
        return 1; # Recursive success.
       }
       else {
        pop @$pathRef;
       }
      }
     }
    return 0; # No path found. Failed.
}

Paul Chernoch 2009-06-26 05:37:53

Answer 13

+5 A:

My attempt in Java. It takes about 2 s to read file and build trie, and around 50 ms to solve the puzzle. I used the dictionary linked in the question (it has a few words that I didn't know exist in English such as fae, ima)

0 [main] INFO gineer.bogglesolver.util.Util  - Reading the dictionary
2234 [main] INFO gineer.bogglesolver.util.Util  - Finish reading the dictionary
2234 [main] INFO gineer.bogglesolver.Solver  - Found: FAM
2234 [main] INFO gineer.bogglesolver.Solver  - Found: FAME
2234 [main] INFO gineer.bogglesolver.Solver  - Found: FAMBLE
2234 [main] INFO gineer.bogglesolver.Solver  - Found: FAE
2234 [main] INFO gineer.bogglesolver.Solver  - Found: IMA
2234 [main] INFO gineer.bogglesolver.Solver  - Found: ELI
2234 [main] INFO gineer.bogglesolver.Solver  - Found: ELM
2234 [main] INFO gineer.bogglesolver.Solver  - Found: ELB
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AXIL
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AXILE
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AXLE
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AMI
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AMIL
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AMLI
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AME
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AMBLE
2234 [main] INFO gineer.bogglesolver.Solver  - Found: AMBO
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AES
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AWL
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AWE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AWEST
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AWA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MIX
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MIL
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MILE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MILO
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MAX
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MAE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MAW
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MEW
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MEWL
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MES
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MESA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MWA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: MWA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIM
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIMA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIMAX
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIME
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIMES
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIMB
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIMBO
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LIMBU
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LEI
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LEO
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LOB
2250 [main] INFO gineer.bogglesolver.Solver  - Found: LOX
2250 [main] INFO gineer.bogglesolver.Solver  - Found: OIME
2250 [main] INFO gineer.bogglesolver.Solver  - Found: OIL
2250 [main] INFO gineer.bogglesolver.Solver  - Found: OLE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: OLM
2250 [main] INFO gineer.bogglesolver.Solver  - Found: EMIL
2250 [main] INFO gineer.bogglesolver.Solver  - Found: EMBOLE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: EMBOX
2250 [main] INFO gineer.bogglesolver.Solver  - Found: EAST
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WAF
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WAX
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WAME
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WAMBLE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WAE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WEA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WEAM
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WEM
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WEA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WES
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WEST
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WAE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WAS
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WASE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: WAST
2250 [main] INFO gineer.bogglesolver.Solver  - Found: BLEO
2250 [main] INFO gineer.bogglesolver.Solver  - Found: BLO
2250 [main] INFO gineer.bogglesolver.Solver  - Found: BOIL
2250 [main] INFO gineer.bogglesolver.Solver  - Found: BOLE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: BUT
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AES
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AWA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AWL
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AWE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AWEST
2250 [main] INFO gineer.bogglesolver.Solver  - Found: ASE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: ASEM
2250 [main] INFO gineer.bogglesolver.Solver  - Found: AST
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SEA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SEAX
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SEAM
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SEMI
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SEMBLE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SEW
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SEA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SWA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SWAM
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SWAMI
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SWA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SAW
2250 [main] INFO gineer.bogglesolver.Solver  - Found: SAWT
2250 [main] INFO gineer.bogglesolver.Solver  - Found: STU
2250 [main] INFO gineer.bogglesolver.Solver  - Found: STUB
2250 [main] INFO gineer.bogglesolver.Solver  - Found: TWA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: TWAE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: TWA
2250 [main] INFO gineer.bogglesolver.Solver  - Found: TWAE
2250 [main] INFO gineer.bogglesolver.Solver  - Found: TWAS
2250 [main] INFO gineer.bogglesolver.Solver  - Found: TUB
2250 [main] INFO gineer.bogglesolver.Solver  - Found: TUX

Source code consists of 6 classes. I'll post them below (if this is not the right practice on StackOverflow, please tell me).

gineer.bogglesolver.Main

package gineer.bogglesolver;

import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.Logger;

public class Main
{
    private final static Logger logger = Logger.getLogger(Main.class);

    public static void main(String[] args)
    {
        BasicConfigurator.configure();

        Solver solver = new Solver(4,
                        "FXIE" +
                        "AMLO" +
                        "EWBX" +
                        "ASTU");
        solver.solve();

    }
}

gineer.bogglesolver.Solver

package gineer.bogglesolver;

import gineer.bogglesolver.trie.Trie;
import gineer.bogglesolver.util.Constants;
import gineer.bogglesolver.util.Util;
import org.apache.log4j.Logger;

public class Solver
{
    private char[] puzzle;
    private int maxSize;

    private boolean[] used;
    private StringBuilder stringSoFar;

    private boolean[][] matrix;
    private Trie trie;

    private final static Logger logger = Logger.getLogger(Solver.class);

    public Solver(int size, String puzzle)
    {
        trie = Util.getTrie(size);
        matrix = Util.connectivityMatrix(size);

        maxSize = size * size;
        stringSoFar = new StringBuilder(maxSize);
        used = new boolean[maxSize];

        if (puzzle.length() == maxSize)
        {
            this.puzzle = puzzle.toCharArray();
        }
        else
        {
            logger.error("The puzzle size does not match the size specified: " + puzzle.length());
            this.puzzle = puzzle.substring(0, maxSize).toCharArray();
        }
    }

    public void solve()
    {
        for (int i = 0; i < maxSize; i++)
        {
            traverseAt(i);
        }
    }

    private void traverseAt(int origin)
    {
        stringSoFar.append(puzzle[origin]);
        used[origin] = true;

        //Check if we have a valid word
        if ((stringSoFar.length() >= Constants.MINIMUM_WORD_LENGTH) && (trie.containKey(stringSoFar.toString())))
        {
            logger.info("Found: " + stringSoFar.toString());
        }

        //Find where to go next
        for (int destination = 0; destination < maxSize; destination++)
        {
            if (matrix[origin][destination] && !used[destination] && trie.containPrefix(stringSoFar.toString() + puzzle[destination]))
            {
                traverseAt(destination);
            }
        }

        used[origin] = false;
        stringSoFar.deleteCharAt(stringSoFar.length() - 1);
    }

}

gineer.bogglesolver.trie.Node

package gineer.bogglesolver.trie;

import gineer.bogglesolver.util.Constants;

class Node
{
    Node[] children;
    boolean isKey;

    public Node()
    {
        isKey = false;
        children = new Node[Constants.NUMBER_LETTERS_IN_ALPHABET];
    }

    public Node(boolean key)
    {
        isKey = key;
        children = new Node[Constants.NUMBER_LETTERS_IN_ALPHABET];
    }

    /**
     Method to insert a string to Node and its children

     @param key the string to insert (the string is assumed to be uppercase)
     @return true if the node or one of its children is changed, false otherwise
     */
    public boolean insert(String key)
    {
        //If the key is empty, this node is a key
        if (key.length() == 0)
        {
            if (isKey)
                return false;
            else
            {
                isKey = true;
                return true;
            }
        }
        else
        {//otherwise, insert in one of its child

            int childNodePosition = key.charAt(0) - Constants.LETTER_A;
            if (children[childNodePosition] == null)
            {
                children[childNodePosition] = new Node();
                children[childNodePosition].insert(key.substring(1));
                return true;
            }
            else
            {
                return children[childNodePosition].insert(key.substring(1));
            }
        }
    }

    /**
     Returns whether key is a valid prefix for certain key in this trie.
     For example: if key "hello" is in this trie, tests with all prefixes "hel", "hell", "hello" return true

     @param prefix the prefix to check
     @return true if the prefix is valid, false otherwise
     */
    public boolean containPrefix(String prefix)
    {
        //If the prefix is empty, return true
        if (prefix.length() == 0)
        {
            return true;
        }
        else
        {//otherwise, check in one of its child
            int childNodePosition = prefix.charAt(0) - Constants.LETTER_A;
            return children[childNodePosition] != null && children[childNodePosition].containPrefix(prefix.substring(1));
        }
    }

    /**
     Returns whether key is a valid key in this trie.
     For example: if key "hello" is in this trie, tests with all prefixes "hel", "hell" return false

     @param key the key to check
     @return true if the key is valid, false otherwise
     */
    public boolean containKey(String key)
    {
        //If the prefix is empty, return true
        if (key.length() == 0)
        {
            return isKey;
        }
        else
        {//otherwise, check in one of its child
            int childNodePosition = key.charAt(0) - Constants.LETTER_A;
            return children[childNodePosition] != null && children[childNodePosition].containKey(key.substring(1));
        }
    }

    public boolean isKey()
    {
        return isKey;
    }

    public void setKey(boolean key)
    {
        isKey = key;
    }
}

gineer.bogglesolver.trie.Trie

package gineer.bogglesolver.trie;

public class Trie
{
    Node root;

    public Trie()
    {
        this.root = new Node();
    }

    /**
     Method to insert a string to Node and its children

     @param key the string to insert (the string is assumed to be uppercase)
     @return true if the node or one of its children is changed, false otherwise
     */
    public boolean insert(String key)
    {
        return root.insert(key.toUpperCase());
    }

    /**
     Returns whether key is a valid prefix for certain key in this trie.
     For example: if key "hello" is in this trie, tests with all prefixes "hel", "hell", "hello" return true

     @param prefix the prefix to check
     @return true if the prefix is valid, false otherwise
     */
    public boolean containPrefix(String prefix)
    {
        return root.containPrefix(prefix.toUpperCase());
    }

    /**
     Returns whether key is a valid key in this trie.
     For example: if key "hello" is in this trie, tests with all prefixes "hel", "hell" return false

     @param key the key to check
     @return true if the key is valid, false otherwise
     */
    public boolean containKey(String key)
    {
        return root.containKey(key.toUpperCase());
    }


}

gineer.bogglesolver.util.Constants

package gineer.bogglesolver.util;

public class Constants
{

    public static final int NUMBER_LETTERS_IN_ALPHABET = 26;
    public static final char LETTER_A = 'A';
    public static final int MINIMUM_WORD_LENGTH = 3;
    public static final int DEFAULT_PUZZLE_SIZE = 4;
}

gineer.bogglesolver.util.Util

package gineer.bogglesolver.util;

import gineer.bogglesolver.trie.Trie;
import org.apache.log4j.Logger;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class Util
{
    private final static Logger logger = Logger.getLogger(Util.class);
    private static Trie trie;
    private static int size = Constants.DEFAULT_PUZZLE_SIZE;

    /**
     Returns the trie built from the dictionary.  The size is used to eliminate words that are too long.

     @param size the size of puzzle.  The maximum lenght of words in the returned trie is (size * size)
     @return the trie that can be used for puzzle of that size
     */
    public static Trie getTrie(int size)
    {
        if ((trie != null) && size == Util.size)
            return trie;

        trie = new Trie();
        Util.size = size;

        logger.info("Reading the dictionary");
        final File file = new File("dictionary.txt");
        try
        {
            Scanner scanner = new Scanner(file);
            final int maxSize = size * size;
            while (scanner.hasNext())
            {
                String line = scanner.nextLine().replaceAll("[^\\p{Alpha}]", "");

                if (line.length() <= maxSize)
                    trie.insert(line);
            }
        }
        catch (FileNotFoundException e)
        {
            logger.error("Cannot open file", e);
        }

        logger.info("Finish reading the dictionary");
        return trie;
    }

    static boolean[] connectivityRow(int x, int y, int size)
    {
        boolean[] squares = new boolean[size * size];
        for (int offsetX = -1; offsetX <= 1; offsetX++)
        {
            for (int offsetY = -1; offsetY <= 1; offsetY++)
            {
                final int calX = x + offsetX;
                final int calY = y + offsetY;
                if ((calX >= 0) && (calX < size) && (calY >= 0) && (calY < size))
                    squares[calY * size + calX] = true;
            }
        }

        squares[y * size + x] = false;//the current x, y is false

        return squares;
    }

    /**
     Returns the matrix of connectivity between two points.  Point i can go to point j iff matrix[i][j] is true
     Square (x, y) is equivalent to point (size * y + x).  For example, square (1,1) is point 5 in a puzzle of size 4

     @param size the size of the puzzle
     @return the connectivity matrix
     */
    public static boolean[][] connectivityMatrix(int size)
    {
        boolean[][] matrix = new boolean[size * size][];
        for (int x = 0; x < size; x++)
        {
            for (int y = 0; y < size; y++)
            {
                matrix[y * size + x] = connectivityRow(x, y, size);
            }
        }
        return matrix;
    }
}

gineer 2009-07-01 02:52:15

Indent code 4 spaces to show it as code

Paolo Bergantino 2009-07-01 02:58:34

I was comparing my output with outputs from other StackOverflowers, and it seems Adam, John, and rvarcher's outputs were missing some words. For example, "Mwa" is in the dictionary (yeah!), but it is not returned in outputs from Adam, John, and rvarcher. It is returned twice in Paolo's PHP link.

gineer 2009-07-01 08:04:02

Answer 14

+1 A:

I know I'm super late but I made one of these a while ago in PHP - just for fun too...

http://www.lostsockdesign.com.au/sandbox/boggle/index.php?letters=fxieamloewbxastu Found 75 words (133 pts) in 0.90108 seconds

F.........X..I..............E............... A......................................M..............................L............................O............................... E....................W............................B..........................X A..................S..................................................T.................U....

Gives some indication of what the program is actually doing - each letter is where it starts looking through the patterns while each '.' shows a path that it has tried to take. The more '.' there are the further it has searched.

Let me know if you want the code... it is a horrible mix of PHP and HTML that was never meant to see the light of day so I dare not post it here :P

Danny 2009-10-07 12:59:10

Answer 15

+1 A:

I spent 3 months working on a solution to the 10 best point dense 5x5 Boggle boards problem.

The problem is now solved and laid out with full disclosure on 5 web pages. Please contact me with questions.

The board analysis algorithm uses an explicit stack to pseudo-recursively traverse the board squares through a Directed Acyclic Word Graph with direct child information, and a time stamp tracking mechanism. This may very well be the world's most advanced lexicon data structure.

The scheme evaluates some 10,000 very good boards per second on a quad core. (9500+ points)

Parent Web Page:

DeepSearch.c - www.pathcom.com/~vadco/deep.html

Component Web Pages:

Optimal Scoreboard - www.pathcom.com/~vadco/binary.html

Advanced Lexicon Structure - www.pathcom.com/~vadco/adtdawg.html

Board Analysis Algorithm - www.pathcom.com/~vadco/guns.html

Parallel Batch Processing - www.pathcom.com/~vadco/parallel.html

- This comprehensive body of work will only interest a person who demands the very best.

JohnPaul Adamovsky 2009-11-19 07:06:12

Answer 16

+2 A:

I realize this question's time has come and gone, but since I was working on a solver myself, and stumbled onto this while googling about, I thought I should post a reference to mine as it seems a bit different from some of the others.

I chose to go with a flat array for the game board, and to do recursive hunts from each letter on the board, traversing from valid neighbor to valid neighbor, extending the hunt if the current list of letters if a valid prefix in an index. While traversing the notion of the current word is list of indexes into board, not letters that make up a word. When checking the index, the indexes are translated to letters and the check done.

The index is a brute force dictionary that's a bit like a trie, but allows for Pythonic queries of the index. If the words 'cat' and 'cater' are in the list, you'll get this in the dictionary:

   d = { 'c': ['cat','cater'],
     'ca': ['cat','cater'],
     'cat': ['cat','cater'],
     'cate': ['cater'],
     'cater': ['cater'],
   }

So if the current_word is 'ca' you know that it is a valid prefix because 'ca' in d returns True (so continue the board traversal). And if the current_word is 'cat' then you know that it is a valid word because it is a valid prefix and 'cat' in d['cat'] returns True too.

If felt like this allowed for some readable code that doesn't seem too slow. Like everyone else the expense in this system is reading/building the index. Solving the board is pretty much noise.

The code is at http://gist.github.com/268079. It is intentionally vertical and naive with lots of explicit validity checking because I wanted to understand the problem without crufting it up with a bunch of magic or obscurity.

cdent 2010-01-03 19:04:29

Answer 17

+1 A:

Just for fun, I implemented one in bash. It is not super fast, but reasonable.

http://dev.xkyle.com/bashboggle/

Kyle 2010-08-24 04:12:00

ansaurus

tags:

views:

answers:

How to find list of possible words from a letter matrix [Boggle Solver]

Perl Implementation

More Mumblings on that Regex Optimization

gineer.bogglesolver.Main

gineer.bogglesolver.Solver

gineer.bogglesolver.trie.Node

gineer.bogglesolver.trie.Trie

gineer.bogglesolver.util.Constants

gineer.bogglesolver.util.Util

related questions