views:

70

answers:

2

So I have a list of words (the entire English dictionary).

For a word matching game, when a player moves a piece I need to check the entire dictionary to see if the the word that the player made exists in the dictionary. I need to do this as quickly as possible. simply iterating through the dictionary is way too slow.

What is the quickest algorithm in AS3 to search a long list like this for a match, and what datatype should I use? (ie array, object, Dictionary etc)

+1  A: 

This isn't specific to ActionScript, but a Trie is a suitable data structure for storing words.

Kobi
Thanks! I'll look in to this.
Nuthman
+4  A: 

I would first go with an Object, which is a hash table (at least, storage-wise).

So, for every word in your list, make an entry in your dictionary Object and store true as its value.

Then, you just have to check if a given word is a key into your dictionary to know whether the word the user has choosen is valid or not.

This works really fast in this simple test (with 10,000,000 entries):

var dict:Object = {};
for(var i:int = 0; i < 10000000; i++) {
    dict[i] = true;
}
var btn:Sprite = new Sprite();
btn.graphics.beginFill(0xff0000);
btn.graphics.drawRect(0,0,50,50);
btn.graphics.endFill();
addChild(btn);
btn.addEventListener(MouseEvent.CLICK,checkWord);

var findIt:Boolean = true;
function checkWord(e:MouseEvent):void {
    var word:String;
    if(findIt) {
        word = "3752132";
    } else {
        word = "9123012456";
    }

    if(dict[word]) {
        trace(word + " found");
    } else {
        trace(word + " not found");
    }
    findIt = !findIt;
}

It takes a little longer to build the dictionary, but lookup is almost instantaneous.

The only caveat is that you will have to consider certain keys that will pass the check and not necessarily be part of your words list. Words such as toString, prototype, etc. There are just a few of them, but keep that in mind.

I would try something like this with your real data set. If it works fine, then you have a really easy solution. Go have a beer (or whatever you prefer).

Now, if the above doesn't really work after testing it with real data (notice I've build the list with numbers cast as strings for simplicity), then a couple of options, off the top of my head:

1) Partition the first dict into a set of dictionaries. So, instead of having all the words in dict, have a dictionary for words that begin with 'a', another for 'b', etc. Then, before looking up a word, check the first char to know where to look it up.

Something like:

var word:String     = "hello";
var dictKey:String  = word.charAt(0);
// actual check
if(dict[dictKey][word]) {
    trace("found");
} else {
    trace("not found");
}

You can eventually repartition if necessary. I.e, make dict['a'] point to another set of dictionaries indexed by the first two characters. So, you'll have dict['a']['b'][wordToSearch]. There are a number of possible variations on this idea (you'd also have to come up with some strategy to cope with words of two letters, such as "be", for instance).

2) Try a binary search. The problem with it is that you'll first have to sort the list, upfront. You have to do it just once, as it doesn't make sense to remove words from your dict. But with millions of words, it might be rarther intensive.

3) Try some fancy data structures from open source libraries such as:

http://sibirjak.com/blog/index.php/collections/as3commons-collections/

http://lab.polygonal.de/ds/

But again, as I said above, I'd first try the easiest and simpler solution and check if it works against the real data set.

Added

A simple way to deal with keywords used for Object's built-in properties:

var dict:Object = {};
var keywordsInDict:Array = [];

function buildDictionary():void {
    //  let's assume this is your original list, retrieved 
    //  from XML or other external means
    //  it contains "constructor", which should be dealt with
    //  separately, as it's a built-in prop of Object
    var sourceList:Array = ["hello","world","foo","bar","constructor"];
    var len:int = sourceList.length;
    var word:String;
    //  just a dummy vanilla object, to test if a word in the list 
    //  is already in use internally by Object
    var dummy:Object = {};

    for(var i:int = 0; i < len; i++) {
        //  also, lower-casing is a good idea
        //  do that when you check words as well
        word = sourceList[i].toLowerCase();
        if(!dummy[word]) {
            dict[i] = true;
        } else {
            //  it's a keyword, so store it separately
            keywordsInDict.push(word);
        }
    }
}

Now, just add an extra check for built-in props in the checkWords function:

function checkWord(e:MouseEvent):void {
    var word:String;
    if(findIt) {
        word = "Constructor";
    } else {
        word = "asdfds";
    }
    word = word.toLowerCase();
    var dummy:Object = {};
    //  check first if the word is a built-in prop 
    if(dummy[word]) {
        //  if it is, check if that word was in the original list
        //  if it was present, we've stored it in keywordsInDict
        if(keywordsInDict.indexOf(word) != -1) {
            trace(word + " found");
        } else {
            trace(word + " not found");
        }
    // not a built-in prop, so just check if it's present in dict 
    } else {
        if(dict[word]) {
            trace(word + " found");
        } else {
            trace(word + " not found");
        }
    }
    findIt = !findIt;
}
Juan Pablo Califano
This is very helpful and detailed. I will give it a go and see what happens! Luckily the maximum word length for my game is 5 letters, so it helps reduce the size of the dictionary significantly. Thanks!
Nuthman
Great answer. It's often best to try things in order of simplicity, and stop once you get acceptable performance with your real data.
fenomas