views:

1261

answers:

6

I have a list of files. I would like to scan through and keep a count of the number of files with the same size. the issue is with filesize which is a long, as we know, hashmap will take in only an object and not a primitive. So using new Long(filesize), i put it into the hashmap. instead of getting a pair of (filesize, count), i got a list of (filesize, 1) due to the fact that each Long obj is unique.

how do I go about building this accumulator?

any solution for 1.4.2?

+10  A: 

You simply do it this way:

Map<Long, Integer> count = new HashMap<Long, Integer>();
for (File file : files) {
  long size = file.getTotalSpace();
  Integer n = count.get(size);
  if (n == null) {
    count.put(size, 1);
  } else {
    count.put(size, n + 1);
  }
}

There is some auto-boxing and unboxing going on here.

cletus
+1  A: 

or you could use AtomicInteger as a mutable integer.

Map<Long, AtomicInteger> count = new HashMap<Long, AtomicInteger>();
for (File file : files) {
  long size = file.length(); // getTotalSpace() get the space consumed (e.g. a multiple of 8K) rather the actual file size.
  AtomicInteger n = count.get(size);
  if (n == null) {
    count.put(size, new AtomicInteger(1));
  } else {
    n.getAndIncrement();
  }
}
Peter Lawrey
+3  A: 

Instead of using new Long(size) , you should use Long.valueOf(size). that will return the same Long reference that is internally cached, and should also boost performance (not that it will be visible unless you do millions of these new Long() operations).

ps. only works for java 1.5 or above

Chii
+1  A: 

Expanding on what cletus wrote.

His solution is fine, except it only stores each filesize that you come across and the number of files that have this size. If you ever want to know which files those are this data structure will be useless to you so I don't think cletus solution is quite complete. Instead I would do

Map<Long, Collection<File>> count = new HashMap<Long, Collection<File>>();
for (File file : files) {
long size = file.getTotalSpace();
Collection<File> c = count.get(size);
if (c == null) {
    c = new ArrayList<File>(); //or whatever collection you feel comfortable with
    count.put(size, c);
}
    c.add(file);
}

then you can get the number of files with c.size() and you can iterate through all the files with that number easily without having to run this procedure again.

ldog
You forget to put the arraylist into the map.
Dennis Cheung
thanks! haha, I tend to forget these things and they come back to bite me in the ass.
ldog
useful solution, though cletus solution is closer to what I need.
zeroin23
A: 

I think there's more to this, and we'll need more details from you. I'm assuming you know there's definitely more than one file of a given size, otherwise I'd first check to see that that's the case. For all you know, you simply have a lot of files with unique file sizes.

You mentioned:

...due to the fact that each Long obj is unique.

I don't think this is the problem. While this may be true depending on how you are instantiating the Longs, it should not prevent HashMaps from behaving the way you want. As long as the two key objects return the same hashCode() value, and the equals() method say they are equal, your HashMap will not create another entry for it. In fact, it should not be possible for you to see "a list of (filesize, 1)" with the same filesize values (unless you wrote your own Long and failed to implement hashCode()/equals() correctly).

That said, Cletus' code should work if you're using Java 5 or higher, if you're using Java 1.4 or below, you'll need to either do your own boxing/unboxing manually, or look into Apache Commons Collections. Here's the pre-Java 5 version of Cletus' example:

Map count = new HashMap();
for (Iterator filesIter = files.iterator(); filesIter.hasNext();) {
  File file = (File)filesIter.next();
  long size = file.getTotalSpace();
  Integer n = count.get(size);
  if (n == null) {
    count.put(size, Integer.valueOf(1));
  } else {
    count.put(size, Integer.valueOf(n.intValue() + 1));
  }
}
Jack Leow
it did happened on a jdk1.4.2 machine...
zeroin23
+1  A: 

You can use Trove to store pairs (long,int) - TLongIntHashMap

adrian.tarau
+1 will be useful when I start accumulating other metrics...
zeroin23