ansaurus

Question

Answer 1

A:

If you are only counting word frequencies, sentence parsing is unnecessary. All you need to do is tokenise the input and then count word frequencies using a java HashMap. If you want to use the Stanford tools, then use any of the tokenisers in edu.stanford.nlp.process.

This gives you the frequency of any given word, but in general it may not be possible to find the word corresponding to a given frequency rank, since some words may be equally frequent in the document.

StompChicken 2009-12-01 11:42:09

the Lexicon interface seems like it could be useful, but how do I fill it with data?

Rosarch 2009-12-02 14:45:03

It's probably not useful for your needs, you may be getting misled by the name. Lexicon is a subcomponent of the parser which "provide(s) a conditional probability P(word|tag)". Lexicon is not designed to count word frequencies.

StompChicken 2009-12-04 16:12:57

I'm not concerned with counting word frequencies in the text sample, but in the entire corpus. (so "the" would be a more frequent word than "pumpernickel")

Rosarch 2009-12-05 21:06:15

ansaurus

tags:

views:

answers:

Java Stanford NLP: Find word frequency?

related questions