tags:

views:

376

answers:

5

Over the holidays, my family loves to play Boggle. Problem is, I'm terrible at Boggle. So I did what any good programmer would do: wrote a program to play for me.

At the core of the algorithm is a simple prefix trie, where each node is a dict of references to the next letters.

This is the trie:add implementation:

add([], Trie) ->
    dict:store(stop, true, Trie);

add([Ch|Rest], Trie) ->
    % setdefault(Key, Default, Dict) ->
    %     case dict:find(Key, Dict) of
    %         { ok, Val } -> { Dict, Val }
    %         error -> { dict:new(), Default }
    %     end.
    { NewTrie, SubTrie } = setdefault(Ch, dict:new(), Trie),
    NewSubTrie = add(Rest, SubTrie),
    dict:store(Ch, NewSubTrie, NewTrie).

And you can see the rest, along with an example of how it's used (at the bottom), here:

http://gist.github.com/263513

Now, this being my first serious program in Erlang, I know there are probably a bunch of things wrong with it… But my immediate concern is that it uses 800 megabytes of RAM.

So, what am I doing most-wrong? And how might I make it a bit less-wrong?

+1  A: 

I don't know about your algorithm, but if you're storing that much data, maybe you should look into using Erlang's built-in digraph library to represent your trie, instead of so many dicts. http://www.erlang.org/doc/man/digraph.html

Paul
Cool - I'll look into that. Thanks.
David Wolever
+4  A: 

You could implement this functionality by simply storing the words in an ets table:

% create table; add words
> ets:new(words, [named_table, set]).
> ets:insert(words, [{"zed"}]).  
> ets:insert(words, [{"zebra"}]).    

% check if word exists
> ets:lookup(words, "zed").          
[{"zed"}]

% check if "ze" has a continuation among the words
78> ets:match(words, {"ze" ++ '$1'}).
[["d"],["bra"]]

If trie is a must, but you can live with a non-functional approach, then you can try digraphs, as Paul already suggested.

If you want to stay functional, you might save some bytes of memory by using structures using less memory, for example proplists, or records, such as -record(node, {a,b,....,x,y,z}).

Zed
Ah, cool - thanks for the tips.
David Wolever
Alright, so I've been tinkering with ets, but I've been getting issues with "bad argument". Maybe you know a simple solution? Question is here: http://stackoverflow.com/questions/1964990/erlang-ets-reset-ets-table-after-getting-a-bad-argument
David Wolever
Alright, I've tinkered with the proplist implementation too... And it has hit an issue which causes the shell to hang. I've asked that here: http://stackoverflow.com/questions/1982257/erlang-erl-shell-hangs-after-building-a-large-data-structure (ps: thanks for all your help - it's much appreciated).
David Wolever
+3  A: 

I don't remember how much memory a dict takes, but let's estimate. You have 2.5e6 characters and 2e5 words. If your trie had no sharing at all, that would take 2.7e6 associations in the dicts (one for each character and each 'stop' symbol). A simple purely-functional dict representation would maybe 4 words per association -- it could be less, but I'm trying to get an upper bound. On a 64-bit machine, that'd take 8*4*2.7 million bytes, or 86 megabytes. That's only a tenth of your 800M, so something's surely wrong here.

Update: dict.erl represents dicts with a hashtable; this implies lots of overhead when you have a lot of very small dicts, as you do. I'd try changing your code to use the proplists module, which ought to match my calculations above.

Darius Bacon
To check how much memory a dict() construct takes, call: `erts_debug:flat_size(dict:new())`. It returns 46 words, which is 184 bytes on a 32-bit system, or 368 bytes on a 64-bit one.
Zed
Thanks for the suggestion... Although I've run into a strange problem where the shell hangs when I create my proplist-based-trie, which I've asked here: http://stackoverflow.com/questions/1982257/erlang-erl-shell-hangs-after-building-a-large-data-structure - could you, by any chance, offer any insight there?
David Wolever
+1  A: 

If all words are in English, and the case doesn't matter, all characters can be encoded by numbers from 1 to 26 (and in fact, in Erlang they are numbers from 97 to 122), reserving 0 for stop. So you can use the array module as well.

Alexey Romanov
+1  A: 

An alternative way to solve the problem is going through the word list and see if the word can be constructed from the dice. That way you need very little RAM, and it might be more fun to code. (optimizing and concurrency)

Koistinen
That's an interesting idea - thanks.
David Wolever