views:

93

answers:

5

Hi

Im coding a small app more or less like a word game. There is a requirement to validate a word that the user may create to check if it is a valid english word or not. I have thought of some ways to implement this -- 1) Have a hashmap with every english word as a key and a boolean as value. this way i could search for the key at the time of user validation ans so on. 2) Send a HTTP request to some site like dictionary.com to verify if the word exists or not.

Though HTTP request seems to me like a nice way to get this implemented, I wanted to have a hashmap which is at first is filled in and then at periodic intervals updated from a source, say dictionary.com so that I could avoid the latencies involved in the HTTP request implementation.

Any pointers on how I could fill in the hashmap with the words from the source would be greatly appreciated.

Thanks p1nG

A: 

I do not think checking a hash of a word would be enough. Two words might have same hash. Moreover, random sequence of letters can have same hash value as a correct word. Taking in account these two points, I do not think you will be able to avoid checking your word (looking it up in a dictionary).

I am not sure what would be a best way to fill up your dictionary. Try to find free dictionary software and check what their license says about the data they use. I think it would be easier to buy something like this.

If that is not an option, online processing is not a bad option I think.

Georgy Bolyuba
From what I understood, when a collection has a collision with 2 or more entries with the same hash key it iterates over each one checking the plaintext key for a match. So I would guess that the plaintext key stored with the collection is checked too. Correct me if I'm wrong here.
DrDipshit
That is correct. If there is a hash collisions it will be resolved by using the equals() method.
Pace
A: 

How about downloading a list of words. You could use WordNet http://wordnet.princeton.edu/wordnet/ a list of 155,287 words with synonyms and much more.

Or google for something like "list of english words" lots of relevant links on the first page.

Natan Cox
A: 

If you only want to check if a word exists in the dictionary, why not use a HashSet? You can use a plaintext dictionary file with a word on each line, or at least that is what I have done in the past.

Not sure about updating it, but if dictionary.com provides a file with with wordlistings you can just download that, then open the file and add every entry to you hash map/set. Assuming no words would be removed, existing entries would just be overwritten.

Edit: Just wrote a test prog that should demonstrate collisions would not be a problem with a hashmap or hash set when checking your word.

import java.util.HashSet;

public class CollidingHash {
    String value;
    public CollidingHash(String s){
        value=s;
    }
    @Override 
    public int hashCode(){
        return 1;
    }
    @Override 
    public boolean equals(Object o){
        if(! (o instanceof CollidingHash)){
            return false;
        }
        CollidingHash c2 = (CollidingHash)o;
        return value.equals(c2.value);
    }
    public static void main(String[] args) {
        HashSet<CollidingHash> dict = new HashSet<CollidingHash>();
        CollidingHash a = new CollidingHash("This");
        CollidingHash b = new CollidingHash("That");
        dict.add(a);
        System.out.println("Is "+ (dict.contains(b)? "Bad": "OK"));
    }   
}

Edit2: added equals method as Pace mentioned.

DrDipshit
+1  A: 

You can use web services and Big Huge Thesaurus. It's a REST Web services, so you might need tools like Jersey, or RestEasy.

An alternative would be Oanaware and its SOAP web service.


Resources :

On the same topic :

Colin Hebert
A: 

1) Have a hashmap with every english word as a key and a boolean as value. this way i could search for the key at the time of user validation ans so on.

A HashMap is overkill for this task. You just need to know whether a word exists, so you could use a HashSet. After adding all the words to the HashSet, you would use the contains() method to check to see whether a word exists in the HashSet or not. But you must be aware that this is a case-sensitive approach, so you would have to make sure that all your words have the same case (for example, "hello" will not match "Hello").

Also, I don't know how memory-intensive loading the entire English dictionary into memory would be. If you run into problems, a better approach might be to scan the dictionary file every time you need to check if a word exists.

2) Send a HTTP request to some site like dictionary.com to verify if the word exists or not.

This would work too, but it relies on (1) the computer having an active Internet connection and (2) the dictionary.com website being up.

Michael Angstadt