ansaurus

Question

Word frequency counter

Answer 1

+1 A:

Take a look at - WordCount

adatapost 2010-06-29 10:07:14

Answer 2

+3 A:

The simplest word counter in Java is:

String[] words=yourtext.split(" ");
HashMap<String,Integer> frequencies=new HashMap<String,Integer>();
for (String w: Arrays.asList(words)){
  Integer num=frequencies.get(w);
  if (num!=null)
    frequencies.put(w,num+1);
  else
    frequencies.put(w,1);
}

The complexity, of course, comes from doing something more sophisticated than

yourtext.split(" ")

For this, you can use OpenNLP's Tokenizer, or Stanford's PTBTokenizer.

Ken Bloom 2010-06-29 20:43:01

Answer 3

+1 A:

How many words do you have to count? A hundred? A thousand? A trillion? If you're going to be doing processing on large data sets check out Hadoop - most of the start tutorials kick off with word counting examples that count appearances of words in books.

Here is one of Apache's official examples - http://wiki.apache.org/hadoop/WordCount - but there are lots of these available online.

gnucom 2010-07-27 06:07:46

ansaurus

tags:

views:

answers:

Word frequency counter

related questions