views:

236

answers:

4

I have a list of words and I want to filter it down so that I only have the nouns from that list of words (Using Java). To do this I am looking for an easy way to query a database of words for their type.

My question is does anybody know of a free, easy word lookup API that would enable me to find the class of a word, not necessarily its semantic definition.

Thanks!

Ben.

EDIT: By class of the word I meant 'part-of-speech' thanks for clearing this up

+2  A: 

Word type? Such as verb, noun, adjective, etc? If so, you might run into the issue that some words can be used in more than one way. For example: "Can you trade me that card?", "That was a bad trade."

See this thread for some suggestions.

Have a look at this as well, seems like it might do exactly what you're looking for.

Ben S
In fact "can", "trade" and "card" are all both nouns and verbs.
Chuck
I think the point is that in the first sentence 'trade' is a verb and in the second it is used a noun. The meaning of the word is dependent on the context in which it is used.
StompChicken
A: 

Querying a database of words is going to lead to the problem that Ben S. mentions, e.g. is it lead (v. to show the way) or lead (n. Pb). If you want to spend some time on the problem, look at Part of Speech tagging. There's some good info in another SO thread.

Joe W.
A: 

For English, you could use WordNet with one of the available Java APIs to find the lexical category of a word (which in NLP is most commonly called the part of speech). Using a dedicated POS tagger would be another option.

Fabian Steeg
A: 

I think what you are looking for is the part-of-speech (POS) of a word. In general that will not be possible to determine except in the context of a sentence. There are many words that have can several different potential parts of speech (e.g. 'bank' can be used as a verb or noun).

You could use a POS tagger to get the information you want. However, the following part-of-speech taggers assume assume that you are tagging words within a well-structured English sentence...

  • The OpenNLP Java libraries are generally very good and released under the LGPL. There is a part-of-speech tagger for English and a few other languages included in the distribution. Just go to the project page to get the jar (and don't forget to download the models too).

  • There is also the Stanford part-of-speech tagger, written in Java under the GPL. I haven't had any direct experience with this library, but the Stanford NLP lab is generally pretty awesome.

StompChicken