views:

162

answers:

3

In my code I have a map that is used heavily, several thousand times in a few seconds. Originally I had a TreeMap, but when testing with 9,000 entries I watched my old processor melt. And this needs to scale. So I moved to a HashMap and performance was excellent.

Now I am changing my design and am looking for a MultiMap. However I'm afraid of the performance impact on the get() side, as it must iterate over said large map picking out the matching keys, and when called many many times even synchronized it seems like it would be slow.

Is there a good MultiMap that can handle such large values with great performance? Performance is critical in this application, as there could be many large separate maps handling a very large workload, making "small" performance losses very big issues.

Bonus points if it can be extracted to work alone without any dependencies.

+2  A: 

The one that was recommended to me in one of my questions was the Apache Commons MultiMap: http://commons.apache.org/collections/api-3.2.1/org/apache/commons/collections/MultiHashMap.html

It's free software, so you can at least get the source to look at it, and depending on your license situation, you can modify it or use it standalone.

It uses an ArrayList internally, but I imagine you can probably change it to use a HashSet or something. I would look at the createCollection(Collection coll) method.

UPDATE: Actually, Guava's HashMultiMap appears to already be what I was talking about: http://guava-libraries.googlecode.com/svn/trunk/javadoc/index.html

I looked at the source and it seems that each collection of values is in fact backed by a HashSet.

Matt
Seems to of been depreciated in favor of `MultiValueMap`. But I'm still not sure if the performance, iterating over this large of a collection for each call looks a little expensive
TheLQ
Seems that you're right. I updated my post because I found something else.
Matt
+1  A: 

The choice would largely depend on what you want to do. There are many data-structures and some are better than others in specific areas and vice versa.

I could recommend you potential candidates. If it is entirely read, ImmutableMultiMap might be a good fit.

If you need concurrent read/write, then I'd implement my own multimap, perhaps using ConcurrentHashMap and ConcurrentSkipListSet (you need to be careful because the semantics between a synchronized multimap and a multipmap created this way using non-blocking data structures differ). If you use ConcurrentSkipListSet, you can then use binary search and it's faster than just iterating.

If you have a lot of rows, you could also start by just using a ConcurrentHashMap and a synchronized list. That could significantly reduce the contention, which might be enough to resolve your performance problem and it's simple.

Enno Shioji
A: 

When you mention that you "iterate over said large map picking out the matching keys", that makes me wonder whether you're using the best data structure. Is there a way you could avoid that iteration?

Note that Guava includes multiple multimap implementations with different performance characteristics. As Zwei mentioned, an ImmutableMultimap has better performance than the mutable multimaps. The SetMultimaps are faster if your code checks whether the multimap contains a particular value; otherwise an ArrayListMultimap performs better.

Jared Levy
I ended up creating a custom Many to Many relationship map based on HashSets and HashMaps. And the reason I needed one with good performance is that I would be iterating over the whole map to compare a given string to a string in the object.
TheLQ