views:

81

answers:

3

Do you know a class in Java that counts word frequency of the text,

and maybe gives all the blocks of the text where the word occurs?

+1  A: 

Take a look at - WordCount

adatapost
+3  A: 

The simplest word counter in Java is:

String[] words=yourtext.split(" ");
HashMap<String,Integer> frequencies=new HashMap<String,Integer>();
for (String w: Arrays.asList(words)){
  Integer num=frequencies.get(w);
  if (num!=null)
    frequencies.put(w,num+1);
  else
    frequencies.put(w,1);
}

The complexity, of course, comes from doing something more sophisticated than

yourtext.split(" ")

For this, you can use OpenNLP's Tokenizer, or Stanford's PTBTokenizer.

Ken Bloom
+1  A: 

How many words do you have to count? A hundred? A thousand? A trillion? If you're going to be doing processing on large data sets check out Hadoop - most of the start tutorials kick off with word counting examples that count appearances of words in books.

Here is one of Apache's official examples - http://wiki.apache.org/hadoop/WordCount - but there are lots of these available online.

gnucom