views:

157

answers:

6
+4  A: 

Either look for a data structure allowing you to keep a compacted dictionary in memory, or simply give your process more memory. Three hundred thousand words is not that much.

Thorbjørn Ravn Andersen
And use a java dictionary container (e.g.) hashmap to put your dictionary file in of course :p (I read it like he's always seeking from file).
KillianDS
I am always seeking from a file till now. :|
Myth17
@Myth, don't - just read it into a HashMap and work with that.
Thorbjørn Ravn Andersen
A: 

I think a way to do this could be to use a TreeSet where you put all the dictionary then use the method subSet to retreive all the words beginning by the desired letter and do a random on the subset.

But in my opinion the best way to do this, due to the quantity of data, would be to use a database with SQL requests instead of Java.

Mr_Qqn
A: 

The goal is to increase your English language vocabulary - not to increase your computer's English language vocabulary.

If you do not share this goal, why are you (or your parents) paying tuition?

emory
its a routine college assignment. And I am pretty confident about my english. Can be done easily. Writing a code for it will learning something. :)
Myth17
It's such a stupid assignment that cheating is not only allowed -- it is recommended. I would return a list of 500 profanities just to make my point clear.
COME FROM
I agree with Myth17, sounds like a snooze.
Amir Rachum
If you are confident in your English, why are you taking English class? While I agree that it is a stupid assignment, why are you still enrolled. Why not get a real job or enrol in a decent college? If your boss (in a real job) gave you such a stupid assignment, you would at least have the satisfaction of receiving salary rather than paying tuition.
emory
I was also quite confident with my English when I was in college, but often they REQUIRE us to sit in English class! No matter what your major is. At least in Japan..
Enno Shioji
A: 

If I do this:

class LoadWords {
  public static void main(String... args) {
    try {
      Scanner s = new Scanner(new File("/usr/share/dict/words"));
      ArrayList<String> ss = new ArrayList<String>();
      while (s.hasNextLine())
        ss.add(s.nextLine());
      System.out.format("Read %d words\n", ss.size());
    } catch (FileNotFoundException e) {
      e.printStackTrace(System.err);
    }
  }
}

I can run it with java -mx16m LoadWords, which limits the Java heap size to 16 Mb, which is not that much memory for Java. My /usr/share/dict/words file has approximately 250,000 words in it, so it may be a bit smaller than yours.

You'll need to use a different data structure than the simple ArrayList<String> that I've used. Perhaps a HashMap of ArrayList<String>, keyed on the starting letter of the word would be a good starting choice.

spong
+1  A: 

Hope this doesn't spoil your fun or something, but if I were you I'd take this approach..

Pseudo java:

abstract class Word {
    String word;
    char last();
    char first();         
}

abstract class DynamicDictionary {
    Map<Character,Set<Word>> first_indexed;

    Word removeNext(Word word){
        Set<Word> candidates = first_indexed.get(word.last());
        return removeRandom(candidates);
    }

    /**
     * Remove a random word out from the entire dic.
     */
     Word removeRandom();

    /**
     * Remove and return a random word out from the set provided.
     */
     Word removeRandom(Set<Word> wordset);    
}

and then

Word primer = dynamicDictionary.removeRandom();
List<Word> list = new ArrayList<Word>(500);
list.add(primer);
for(int i=0, Word cur = primer;i<499;i++){
    cur = dynamicDictionary.removeNext(cur);
    list.add(cur);
}

NOTE: Not intended to be viewed as actual java code, just a way to roughly explain the approach (no error handling, not a good class structure if it were really used, no encupsulation etc. etc.)

Should I encounter memory issues, maybe I'll do this:

abstract class Word {
    int lineNumber;
    char last();
    char first();
}

If that is not sufficient, guess I'll use a binary search on the file or put it in a DB etc..

Enno Shioji
A: 

Here is some word frequency lists: http://www.robwaring.org/vocab/wordlists/vocfreq.html

This text file, reachable from the above link, contains the first 2000 words that are used most frequently: http://www.robwaring.org/vocab/wordlists/1-2000.txt

Enno Shioji