Hi,
Is it a good idea to store words of a dictionary with 100.000 words in a static array of string. I'm working on spellchecker and I thought that way would be faster.
Hi,
Is it a good idea to store words of a dictionary with 100.000 words in a static array of string. I'm working on spellchecker and I thought that way would be faster.
Definitely its not a good idea to store so many strings as an array especially if you are using it for spell check which means you will have to search for and compare strings. It would make it inefficient to search or compare a string through the array as it would always be a linear search
I think 100 000 is not so large amount that search wolud be inefficent. Of course it depends ... It would work nice if you are checking if a word exists in array - it's a linear complexity algorithm. You can keep table ordered so you can use quicksort search algoritm and make it more efficent.
On the other hand - if you wold like to find, 5 most likely words (using N-gram method or something) you should consider using Lucene or other text database.
You should generally prefer a Java Collections Framework class to a native Java array for anything non-trivial. In this particular case, what you have is a Set<String>
(since no words should appear more than once in the dictionary).
A HashSet<String>
offers constant time performance for the basic operations add
, remove
, and contains
, and should work very well with String
hashcode formula.
For larger dictionaries, you'd want to use more sophisticated data structures specialized for storing a set of strings (e.g. a trie), but for 100K words, a HashSet
should suffice.
Perhaps using an SQLite database would be more efficient ? I think that's what firefox/thunderbird does for spell checking but I'm not entirely sure.
You won't be able to store that amount of strings in a static variable. Java has a size limit for static code and even method bodies. Simply use a flatfile and read it upon class instanciation - Java is faster than most people think with those things.
How about an approach with in memory database technology like for example sqlite inmemory This allows you to use efficient querying without disk overhead