views:

510

answers:

6

I have a streaming input which has repeated values. I can use any data structure but I have to count the number of occurence of each element. Suppose I have a list of mobile phone suppliers like the following:

Apple
Nokia
Samsung
Apple
LG
Nokia
HTC
Android
Apple
Nokia
Nokia
Apple
Samsung

I have to build any data structure preferably a map with details like

Apple,4
Nokia,4
Samsung,2
LG,1
Android,1

I am not sure whether this is optimal. Is there a better solution than this?
In fact I have yet to write the above as a code. So better code will also help.

+1  A: 

Where is the data comming from? If a db - you can do this very easily in the query on the backend with group by.

Jim Evans
Nope its from a flat file
Harish
+4  A: 

Yes, I would use a Map<String, Integer>. I would wrap the add in something like this:

private static void incrementValue(Map<String, Integer> counters, String toAdd) {
    Integer currValue = counters.get(toAdd);
    if (currValue == null)
        counters.put(toAdd, 1);
    else
        counters.put(toAdd, currValue+1);
}

Or without generics:

private static void incrementValue(Map counters, String toAdd) {
    Integer currValue = (Integer) counters.get(toAdd);
    if (currValue == null)
        counters.put(toAdd, 1);
    else
        counters.put(toAdd, currValue+1);
}
Michael Myers
A small information...I can't use Generics as I have to use Java 1.4
Harish
cool it works and thanks for that
Harish
A: 

A map seems the way to go. Direct access :)

Key: Element value: number of ocurrences, or a list with indexes of the element in the list.

Tom
A: 

Besides the solutions that have been posted the first thing that comes to my mind is to make a table "code - value" and encode the list using codes. It would be very space efficient.

Malcolm
A: 

The most natural structure for this is a Bag aka a Multiset.

A bag is essentially a function from Object to Count.

Google collections has a Multiset however you could easily build your own using a HashMap.

http://google-collections.googlecode.com/svn/trunk/javadoc/index.html?com/google/common/collect/Multiset.html

pjp
thats great but how to get the count ?
Harish
Note that Google Collections requires Java 5, though. Other than that, this is even easier to do than my answer.
Michael Myers
You can get the count by interating over the entrySet().If you want to stream the count out you could extend the implementation to notify a listener when the count changes.
pjp
+2  A: 

Since it was mentioned by the questioner that generics could not be used, as the target platform was Java 1.4, one could use the Apache Commons Collections which doesn't use generics.

The answer by pjp mentions that a Bag can be used.

It turns out, the Apache Commons Collections has a Bag which has a getCount method which will return the count of a certain object that was added to the Bag.

The following is an example that adds some Integer objects to a HashBag, and counts how many of each Integer object that the Bag contains:

Bag b = new HashBag();

b.add(Integer.valueOf(1));
b.add(Integer.valueOf(2));
b.add(Integer.valueOf(2));
b.add(Integer.valueOf(3));

System.out.println("Count for 1: " + b.getCount(Integer.valueOf(1)));
System.out.println("Count for 2: " + b.getCount(Integer.valueOf(2)));
System.out.println("Count for 3: " + b.getCount(Integer.valueOf(3)));

The results were:

Count for 1: 1
Count for 2: 2
Count for 3: 1

(I should add a disclaimer that this code was actually compiled and run on Java 6, but I believe I've only used features which were present from the pre-Java 5 days.)

coobird
thats brilliant...I would like to vote for you but I am yet to get reputation...Thanks for the response
Harish
+1, and this should be the accepted answer. The only possible reason not to do this is a fear of external libraries (which I have myself).
Michael Myers