Do you know a class in Java that counts word frequency of the text,
and maybe gives all the blocks of the text where the word occurs?
Do you know a class in Java that counts word frequency of the text,
and maybe gives all the blocks of the text where the word occurs?
The simplest word counter in Java is:
String[] words=yourtext.split(" ");
HashMap<String,Integer> frequencies=new HashMap<String,Integer>();
for (String w: Arrays.asList(words)){
Integer num=frequencies.get(w);
if (num!=null)
frequencies.put(w,num+1);
else
frequencies.put(w,1);
}
The complexity, of course, comes from doing something more sophisticated than
yourtext.split(" ")
For this, you can use OpenNLP's Tokenizer, or Stanford's PTBTokenizer.
How many words do you have to count? A hundred? A thousand? A trillion? If you're going to be doing processing on large data sets check out Hadoop - most of the start tutorials kick off with word counting examples that count appearances of words in books.
Here is one of Apache's official examples - http://wiki.apache.org/hadoop/WordCount - but there are lots of these available online.