tags:

views:

434

answers:

2

Using a msdos window i am piping in a amazon.txt file. I am trying to use the collections framework. Keep in mind i want to keep this as simple as possible. What i want to do is count all the unique words in the file... no duplicates

This is what i have so far.. please be kind this is my first java project.

import java.util.Scanner;
import java.util.ArrayList;
import java.util.Iterator;


public class project1
{

//ArrayList<String> a = new ArrayList<String>();

public static void main(String[] args)
{
Scanner  sc = new Scanner(System.in); 
String  word;
String grab;

int count = 0;
ArrayList<String> a = new ArrayList<String>();
//Iterator<String> it = a.iterator();

System.out.println("Java project\n");
while (sc.hasNext()) 
{      
word = sc.next();  
a.add(word); 
if (word.equals("---"))
{

break;
    }
   }
Iterator<String> it = a.iterator();

while(it.hasNext())
{
   grab = it.next();

if(grab.contains("a"))
{

 System.out.println(it.next()); // just a check to see
 count++;
         }
        }
 System.out.println("I counted abc = ");
 System.out.println(count);
System.out.println("\nbye...");
}
    }
+9  A: 

In your version, the wordlist a will contain all words but duplicates aswell. You can either

(a) check for every new word, if it is already included in the list (List#contains is the method you should call), or, the recommended solution

(b) replace ArrayList<String> with TreeSet<String>. This will eliminate duplicates automatically and store the words in alphabetical order

Edit

If you want to count the unique words, then do the same as above and the desired result is the collections size. So if you entered the sequence "a a b c ---", the result would be 3, as there are three unique words (a, b and c).

Andreas_D
great answer. +1 but I'm out of votes :)
Carl Smotricz
What i want to do is count all the unique words. NOT abc, etc...
icelated
Andreas_D, i changed the original post..
icelated
+1 Note that option a) will be very slow on big lists.
rsp
ok, i did what you said and used TreeSet.. and just did a count and Got 8 unique words thanks..
icelated
Just remember that `"Word"` and `"word"` are two different words as far as the TreeSet is concerned. So if you want it to be case-insensitive, you would have to do a `toLowerCase()` or `toUpperCase()` before adding the `String` to the TreeSet
Chinmay Kanchi
Does TreeSet count integers? I dont want to count them!
icelated
The TreeSet will contain each unique thing that you put into it. If you don't want to count integers, or punctuation, or whatever, don't put them into the set.
Stephen C
The file has alot of "a" letters in it(like: and or paragraph). however, i am trying to find just a by itself, how can i do that without counting all the letter a thats in other words.?i tried if(grab.contains("a"))
icelated
+2  A: 

Instead of ArrayList<String>, use HashSet<String> (not sorted) or TreeSet<String> (sorted) if you don't need a count of how often each word occurs, Hashtable<String,Integer> (not sorted) or TreeMap<String,Integer> (sorted) if you do.

If there are words you don't want, place those in a HashSet<String> and check that this doesn't contain the word your Scanner found before placing into your collection. If you only want dictionary words, put your dictionary in a HashSet<String> and check that it contains the word your Scanner found before placing into your collection.

lins314159
if i placed it into the hashset how would i use my scanner to check for those words?
icelated
You use your scanner to pick up a sequence of characters first. Then you convert that sequence of characters to all lower case (assuming all words in your HashSet are lower case), then whether that word exists in your HashSet.
lins314159