tags:

views:

135

answers:

4

Hi Everyone,

I'm working on a project which has a dictionary of words and I'm extracting them and adding them to an ArrayList as word objects. I have a class called Word as below.

What I'm wondering is how do I access these word objects to update the frequency? As part of this project, I need to only have one unique word, and increase the frequency of that word by the number of occurrences in the dictionary.

Word(String word)
{
  this.word = word;
  this.freq = 0;
}

public String getWord() { 
    return word;
}

public int getFreq() {
   return freq;
}

public void setFreq() {
   freq = freq + 1;
}

This is how I am adding the word objects to the ArrayList...I think it's ok?

String pattern = "[^a-zA-Z\\s]";
        String strippedString = line.replaceAll(pattern, "");
        line = strippedString.toLowerCase();
        StringTokenizer st = new StringTokenizer(line);
        while (st.hasMoreTokens())
        {
            String newWord = st.nextToken();
            word.add(new Word(newWord));
            count++;
        }
+1  A: 

Use a map to store the words and the Word Object. Ideally a hashset is enough to do this. But internally a hashset is going to use a HashMap anyway. The following piece of code will also be useful for you to increase the frequency of the words that you had already inserted.

Map<String, Word> wordsMap = new HashMap<String, Word>();

String pattern = "[^a-zA-Z\\s]";
String strippedString = line.replaceAll(pattern, "");
line = strippedString.toLowerCase();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens())
{
    String newWord = st.nextToken();
    if(!wordsMap.containsKey(newWord)){
        wordsMap.put(newWord, new Word(newWord));
    }else{
        Word existingWord = wordsMap.get(newWord);
        existingWord.setFreq();
    }

    count++;
}
Bragboy
Thanks for the feedback.I didn't want to go down the road of HashMap's etc, as my intention is to add each word to a Trie and I was going to update the frequency there. Is it overkill having a Word class at all? Shold I just extract the words from the Dictionary and add them to the arrayList? The only reason I had creatd the Word Class was to update teh frequency of that word, couldn't think of a way to do it with just a string.
I suppose your TRIE data structure is having a search() method. It should return you the object that you've stored in a O(1) time. Using that you can increase the frequency of the word. Take a look at http://www.technicalypto.com/2010/04/trie-data-structure-part-4-search.html
Bragboy
That's brilliant, thanks Bragaadeesh.I did implement your code above, but I can't figure out how to check the contents of the HashMap to see the frequency etc. (Have never done them before) could you give me a pointer?
You said you dont want to use to use HashMap. However, if you wish to do so, you can very well refer the code above. The "else" part does the job for you of getting the frequency
Bragboy
+1  A: 

Instead of an ArrayList use a Bag. This keeps the counts for you.

Mark Byers
+1  A: 

I would solve the problem with the following code:

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Word {

  private final String word;
  private int frequency;

  public Word(String word) {
    this.word = word;
    this.frequency = 0;
  }

  public String getWord() {
    return word;
  }

  public int getFrequency() {
    return frequency;
  }

  public void increaseFrequency() {
    frequency++;
  }

I didn't call this method setFrequency because it is not a real setter method. For a real setter method, you would pass it exactly one parameter.

  public static List<Word> histogram(String sentence) {

First, compute the frequency of the individual words.

    String[] words = sentence.split("\\W+");
    Map<String, Word> histo = new HashMap<String, Word>();
    for (String word : words) {
      Word w = histo.get(word);
      if (w == null) {
        w = new Word(word);
        histo.put(word, w);
      }
      w.increaseFrequency();
    }

Then, sort the words such that words with higher frequency appear first. If the frequency is the same, the words are sorted almost alphabetically.

    List<Word> ordered = new ArrayList<Word>(histo.values());
    Collections.sort(ordered, new Comparator<Word>() {
      public int compare(Word a, Word b) {
        int fa = a.getFrequency();
        int fb = b.getFrequency();
        if (fa < fb)
          return 1;
        if (fa > fb)
          return -1;
        return a.getWord().compareTo(b.getWord());
      }
    });

    return ordered;
  }

Finally, test the code with a simple example.

  public static void main(String[] args) {
    List<Word> freq = histogram("a brown cat eats a white cat.");
    for (Word word : freq) {
      System.out.printf("%4d %s\n", word.getFrequency(), word.getWord());
    }
  }
}
Roland Illig
+1  A: 

You can use a google collections' Multiset of String instead of the Word class

True Soft