ansaurus

Question

Answer 1

+9 A:

In your version, the wordlist a will contain all words but duplicates aswell. You can either

(a) check for every new word, if it is already included in the list (List#contains is the method you should call), or, the recommended solution

(b) replace ArrayList<String> with TreeSet<String>. This will eliminate duplicates automatically and store the words in alphabetical order

Edit

If you want to count the unique words, then do the same as above and the desired result is the collections size. So if you entered the sequence "a a b c ---", the result would be 3, as there are three unique words (a, b and c).

Andreas_D 2010-01-08 21:37:26

great answer. +1 but I'm out of votes :)

Carl Smotricz 2010-01-08 21:40:47

What i want to do is count all the unique words. NOT abc, etc...

icelated 2010-01-08 21:40:59

Andreas_D, i changed the original post..

icelated 2010-01-08 21:42:47

+1 Note that option a) will be very slow on big lists.

rsp 2010-01-08 21:59:22

ok, i did what you said and used TreeSet.. and just did a count and Got 8 unique words thanks..

icelated 2010-01-08 22:07:23

Just remember that `"Word"` and `"word"` are two different words as far as the TreeSet is concerned. So if you want it to be case-insensitive, you would have to do a `toLowerCase()` or `toUpperCase()` before adding the `String` to the TreeSet

Chinmay Kanchi 2010-01-08 22:12:26

Does TreeSet count integers? I dont want to count them!

icelated 2010-01-08 22:33:20

The TreeSet will contain each unique thing that you put into it. If you don't want to count integers, or punctuation, or whatever, don't put them into the set.

Stephen C 2010-01-09 05:22:52

The file has alot of "a" letters in it(like: and or paragraph). however, i am trying to find just a by itself, how can i do that without counting all the letter a thats in other words.?i tried if(grab.contains("a"))

icelated 2010-01-09 05:59:34

Answer 2

+2 A:

Instead of ArrayList<String>, use HashSet<String> (not sorted) or TreeSet<String> (sorted) if you don't need a count of how often each word occurs, Hashtable<String,Integer> (not sorted) or TreeMap<String,Integer> (sorted) if you do.

If there are words you don't want, place those in a HashSet<String> and check that this doesn't contain the word your Scanner found before placing into your collection. If you only want dictionary words, put your dictionary in a HashSet<String> and check that it contains the word your Scanner found before placing into your collection.

lins314159 2010-01-08 21:54:16

if i placed it into the hashset how would i use my scanner to check for those words?

icelated 2010-01-09 01:52:38

You use your scanner to pick up a sequence of characters first. Then you convert that sequence of characters to all lower case (assuming all words in your HashSet are lower case), then whether that word exists in your HashSet.

lins314159 2010-01-09 11:00:15

ansaurus

tags:

views:

answers:

Find unique words in a file - java

related questions