I need to test my data structure (in java) which is like a dictionary - holds a key/value map. I would like to know how do you test your data structure? I would like to insert real words in my data structure and then find them. I am wondering if there is a way to download all the english words and then I can read that file and populate my structure. Once populated, I can perform many searches and produce some real statistics of how long does it take to search?
Perhaps Project Gutenberg would be helpful. I've used them on past CS projects. They provide plain text files (e.g. The Valley of Fear), which should be easy to process. You may want to skip over the headers to avoid skewing the results.
This will let you test your dictionary by keeping e.g. a word->count mapping (e.g. Map<String, Integer>
) of the words in the file.
If you're on Linux, you could use the contents of /usr/share/dict/words
; there's also WordNet, an English word database (http://wordnet.princeton.edu/).
There are indeed several open-source dictionaries for the English language, e.g. the WordNet file.
That said, I must insist that the English language is not a “closed” language, nor does it have one true official definition. As such, there is no dictionary that contains “all the English words” and such a dictionary can never exist: English words are made up all the time, and once enough people use them, the become part of the English language. Case in point: “to google.”