views:

191

answers:

1

I'm exploring the hardware/software requirements (ultimate goal is mobile Java app) for a potential free/paid application.

The application will start with this simple goal: Given a list of relevant words in a database, to be able to do word completion on a single string input.

In other words I already know the contents of the database - but the memory footprint/speed/search efficiency of the algorithm will determine the amount of data supported.

I have been starting at the beginning with suffix-based tree searches, but am wondering if anyone has experience with the speed/memory size tradeoffs of this simple approach vs. the more complex ones being talked about in the conferences.

Honestly the initial application only has probably less than 500 words in context so it might not matter, but ultimately the application could expand to tens of thousands or hundreds of thousands of record - thus the question about speed vs. memory footprint.

I suppose I could start with something simple and switch over later, but I hope to understand the tradeoff earlier!

+1  A: 

Word completion suggests that you want to find all the words that start with a given prefix.

Tries are good for this, and particularly good if you're adding or removing elements - other nodes do not need to be reallocated.

If the dictionary is fairly static, and retrieval is important, consider a far simpler data structure: put your words in an ordered vector! You can do binary-search to discover a candidate starting with the correct prefix, and a linear search each side of it to discover all other candidates.

Will
Cool - thanks for the pointer! The trie method seems ideal; presumably it would not take much more than a 6 or 8 deep trie to cover any reasonably sized database. The pointers to each level of trie thus (I'm guessing) implies the memory footprint shouldn't be more than 2x or 3x the base data.