tags:

views:

186

answers:

3

I have similar problem to one discussed here, but with stronger practical usage.

For example, I have a Map<String, Integer>, and I have some function, which is given a key and in case the mapped integer value is negative, puts NULL to the map:

Map<String, Integer> map = new HashMap<String, Integer>();

public void nullifyIfNegative(String key) {
    Integer value = map.get(key);

    if (value != null && value.intValue() < 0) {
        map.put(key, null);
    }
}

I this case, the lookup (and hence, hashCode calculation for the key) is done twice: one for lookup and one for replacement. It would be nice to have another method (which is already in HashMap) and allows to make this more effective:

public void nullifyIfNegative(String key) {
    Map.Entry<String, Integer> entry = map.getEntry(key);

    if (entry != null && entry.getValue().intValue() < 0) {
        entry.setValue(null);
    }
}

The same concerns cases, when you want to manipulate immutable objects, which can be map values:

  • Map<String, String>: I want to append something to the string value.
  • Map<String, int[]>: I want to insert a number into the array.

So the case is quite common. Solutions, which might work, but not for me:

  • Reflection. Is good, but I cannot sacrifice performance just for this nice feature.
  • Use org.apache.commons.collections.map.AbstractHashedMap (it has at least protected getEntry() method), but unfortunately, commons-collections do not support generics.
  • Use generic commons-collections, but this library (AFAIK) is out-of-date (not in sync with latest library version from Apache), and (what is critical) is not available in central maven repository.
  • Use value wrappers, which means "making values mutable" (e.g. use mutable integers [e.g. org.apache.commons.lang.mutable.MutableInt], or collections instead of arrays). This solutions leads to memory loss, which I would like to avoid.
  • Try to extend java.util.HashMap with custom class implementation (which should be in java.util package) and put it to endorsed folder (as java.lang.ClassLoader will refuse to load it in Class<?> defineClass(String name, byte[] b, int off, int len), see sources), but I don't want to patch JDK and it seems like the list of packages that can be endorsed, does not include java.util.

The similar question is already raised on sun.com bugtracker, but I would like to know, what is the opinion of the community and what can be the way out taking in mind the maximum memory & performance effectiveness.

If you agree, this is nice and beneficiary functionality, please, vote this bug!

+3  A: 

As a logical matter, you're right in that the single getEntry would save you a hash lookup. As a practical matter, unless you have a specific use case where you have reason to be concerned about the performance hit( which seems pretty unlikely, hash lookup is common, O(1), and well optimized) what you're worrying about is probably negligible.

Why don't you write a test? Create a hashtable with a few 10's of millions of objects, or whatever's an order of magnitude greater than what your application is likely to create, and average the time of a get() over a million or so iterations (hint: it's going to be a very small number).

A bigger issue with what you're doing is synchronization. You should be aware that if you're doing conditional alterations on a map you could run into issues, even if you're using a Synchronized map, as you'd have to lock access to the key covering the span of both the get() and set() operations.

Steve B.
@Steve Thank you for the answer, but I need really break-though answer to mark my question as answered :) There is no big demand to write a test because there will be some loss on very large maps (and that is my case).
dma_k
@Steve I agree with you concerning synchronization issue. I don't have access to this map from several threads in my case. Yes, "classical" get/put approach has synchronization problem when another thread has removed the entry just after you `get` the value. Then current thread will reincarnate the value without knowing that somebody has already removed it. So the approach with `getEntry` is more secure in this case: if another thread has removed the entry, current thread will set the value for "detached" entry, which is no harm.
dma_k
+1  A: 

Not pretty, but you could use lightweight object to hold a reference to the actual value to avoid second lookups.

HashMap<String, String[]> map = ...;

// append value to the current value of key
String key = "key";
String value = "value";

// I use an array to hold a reference - even uglier than the whole idea itself ;)
String[] ref = new String[1]; // lightweigt object
String[] prev = map.put(key, ref);
ref[0] = (prev != null) ? prev[0] + value : value;

I wouldn't worry about hash lookup performance too much though (Steve B's answer is pretty good in pointing out why). Especially with String keys, I wouldn't worry too much about hashCode() as its result is cached. You could worry about equals() though as it might be called more than once per lookup. But for short strings (which are often used as keys) this is negligible too.

sfussenegger
@sfussenegger Thanks for the answer, but creating object wrappers is not good choice for me. I want to achieve max performance with min memory impact.
dma_k
@dma_k If I remember correctly, memory impact of an array with length 1 is 4 bytes (while a wrapper object would be 12 bytes per instance). Hence (ab)using such arrays it's pretty much the same as a working with pointers in C - and you would't call pointers inefficient, would you? ;)
sfussenegger
@sfussenegger I would expect the size of the array to be `4[=length]+4[=int_size]*length(array)+8_byte_align` (I might be wrong). So, `int[1]` will allocate 8 bytes, `int[2]` - 16 bytes :)
dma_k
@dma_k so obviously, I didn't remember correctly - 4*length is a good rule of thumb that fails miserably for this kind of arrays :) It's actually 16 byte for a String[1] (8 byte object header, 4 byte length, 1*4 byte content). As a wrapper object requires the same, it would be a better (natural) choice. Still I don't think that's a major memory impact - a small performance trade-off at the most ;)
sfussenegger
@sfussenegger +1 for your efforts. I've found the answer here: http://kohlerm.blogspot.com/2008/12/how-much-memory-is-used-by-my-java.html
dma_k
A: 

There are no performance gain from this proposal, because performance of Map in average case is O(1). But enabling access to the raw Entry in such case will raise another problem. It will be possible to change key in entry (even if it's only possible via reflection) and therefore break order of the internal array.

al-wolf
@al-wolf Well, Map.Entry does not have `setKey()`, so there is no way you can break things.
dma_k