views:

514

answers:

9

Using Java, assuming v1.6.

I have a collection where the unique index is a string and the non-unique value is an int. I need to perform thousands of lookups against this collection as fast as possible.

I currently am using a HashMap<String, Integer> but I am worried that the boxing/unboxing of the Integer to int is making this slower.

I had thought of using an ArrayList<String> coupled with an int[].

i.e. Instead of:

int value = (int) HashMap<String, Integer>.get("key");

I could do

int value = int[ArrayList<String>.indexOf("key")];

Any thoughts? Is there a faster way to do this?

p.s. I will only build the collection once and maybe modify it once but each time I will know the size so I can use String[] instead of ArrayList but not sure there's a faster way to replicate indexOf...

+1  A: 

I would guess that the HashMap would give a much faster lookup, but I think this needs some benchmarking to answer correctly.

EDIT: Furthermore, There is no boxing involved, merely unboxing of the already-stored objects, which should be pretty fast, since no object allocation is done in that step. So, I don't think this would give you any more speed, but you should run benchmarks nonetheless.

soulmerge
+12  A: 

Unboxing is fast - no allocations are required. Boxing is a potentially slower, as it needs to allocate a new object (unless it uses a pooled one).

Are you sure you've really got a problem though? Don't complicate your code until you've actually proved that this is a significant hit. I very much doubt that it is.

There are collection libraries available for primitive types, but I would stick to the normal HashMap from the JRE until you've profiled and checked that this is causing a problem. If it really only is thousands of lookups, I very much doubt it'll be a problem at all. Likewise if you're lookup-based rather than addition-based (i.e. you fetch more often than you add) then the boxing cost won't be particularly significant, just unboxing, which is cheap.

I would suggest using intValue() rather than the cast to convert the value to an int though - it makes it clearer (IMO) what's going on.

EDIT: To answer the question in the comment, HashMap.get(key) will be faster than ArrayList.indexOf(key) when the collection is large enough. If you've actually only got five items, the list may well be faster. I assume that's not really the case though.

If you really, really don't want the boxing/unboxing, try Trove (TObjectHashMap). There's also COLT to consider, but I couldn't find the right type in there.

Jon Skeet
Leave it to Jon Skeet to give an answer that makes the whole question irrelevant...
Michael Myers
Looks like benchmarking is going to be the only way to test this properly.I wasn't aware that unboxing is cheap. I'll have to do some profiling to spot the bottle necks.That said, based on the internals of the 2 which should be quicker ArrayList<String>.indexOf or HashMap<String, Integer>.get
Adrian Hope-Bailie
Sorry, one more comment. It IS a significant hit. I need every microsecond I can get here :)
Adrian Hope-Bailie
A: 

One slight problem here: You can have duplicate elements in a List. If you really want to do it the second way, consider using a Set instead.

Having said that, have you done a performance test on the two to see if one is faster than the other?

Edit: Of course, the most popular Set type (HashSet) is itself backed by a HashMap, so switching to a set may not be such a wise change after all.

R. Bemrose
I manage duplicates myself in building the collection. A set wouldn't work as the collection needs to contains some form of key value pairs
Adrian Hope-Bailie
+1  A: 

I think scanning your ArrayList to find the match for your "key" is going to be much slower than your boxing/unboxing concerns.

Robin
A: 

List.indexOf will do a linear scan of the list - O(n) typically. A binary search will do the job in O(log n). A hash table will do it in O(1).

Having large numbers of Integer objects in memory could be a problem. But then the same is true for Strings (both the String and char[]). You could do you own custom DB-style implementation, but I suggest benchmarking first.

Tom Hawtin - tackline
+2  A: 

Any performance gain that you get from not having to box/unbox is significanlty erased by the for loop that you need to go with the indexOf method.

Use the HashMap. Also you don't need the (int) cast, the compiler will take care of it for you.

The array thing would be ok with a small number of items in the array, but then so is the HashMap...

The only way you could make it fast to look up in an array (and this is not a real suggestion as it has too many issues) is if you use the hashCode of the String to work with as the index into the array - don't even think about doing that though! (I only mention it because you might find something via google that talks about it... if they don't explain why it is bad don't read any more about it!)

TofuBeer
A: 

Since you say it is indeed a bottleneck, I'll suggest Primitive Collections for Java; in particular, the ObjectKeyIntMap looks like exactly what you want.

Michael Myers
A: 

If the cost of building the map once and once only doesn't matter, you might want to look at perfect hashing, for example Bob Jenkins' code.

Pete Kirkham
A: 

The map access does not do unboxing for the lookup, only the later access to the result makes it slow.

I suggest to introduce a small wrapper with a getter for the int, such as SimpleInt. It holds the int without conversion. The constructor is not expensive and overall is is cheaper than an Integer.

public SimpleInt
{
    private final int data;

    public SimpleInt(int i)
    {
        data = i;
    }

    // getter here
    ....
}
ReneS