views:

169

answers:

6

I'm looking for a good hash map implementation. Specifically, one that's good for creating a large number of maps, most of them small. So memory is an issue. It should be thread-safe (though losing the odd put might be an OK compromise in return for better performance), and fast for both get and put. And I'd also like the moon on a stick, please, with a side-order of justice.

The options I know are:

  • HashMap. Disastrously un-thread safe.

  • ConcurrentHashMap. My first choice, but this has a hefty memory footprint - about 2k per instance.

  • Collections.sychronizedMap(HashMap). That's working OK for me, but I'm sure there must be faster alternatives.

  • Trove or Colt - I think neither of these are thread-safe, but perhaps the code could be adapted to be thread safe.

Any others? Any advice on what beats what when? Any really good new hash map algorithms that Java could use an implementation of?

Thanks in advance for your input!

A: 

Well, there's a spruced-up Colt in Apache Mahout. It's still not in the current business. What's wrong with protecting the code with a synchronized block? Are you expecting some devilishly complex scheme that hold locks for smaller granularity than put or get?

If you can code one, please contribute it to Mahout.

bmargulies
I think I'd need to synchronize both gets and puts, as otherwise a rehash could cause get() to return junk. And that synchronisation would be on the map itself (not the keys). It would work, but feels less than optimal.
Daniel Winterstein
That's what I meant, more or less.
bmargulies
A: 

It's worth taking a look at the persistent hash maps in Clojure.

These are immutable, thread safe data structures with performance comparable to classic Java HashMaps. You'd obviously need to wrap them if you want a mutable map, but that shouldn't be difficult.

http://clojure.org/data_structures

mikera
+5  A: 

Collections.synchronizedMap() simply makes all the Map methods synchronized.

ConcurrentMap is really the interface you want and there are several implementations (eg ConcurrentHashMap, ConcurrentSkipList). It has several operations that Map doesn't that are important for threadsafe operations. Plus it is more granular than a synchronized Map as an operation will only lock a slice of the backing data structure rather than the entire thing.

cletus
+3  A: 

I have no experience of the following, but I worked with a project once who swore by Javolution for real time and memory sensitive tasks.

I notice in the API there is FastMap that claims to be thread safe. As I say, I've no idea if it's any good for you, but worth a look:

API for FastMap

Javolution Home

Dick Chesterwood
Thanks - FastMap looks interesting and highly configurable.
Daniel Winterstein
+2  A: 

Google Collection's MapMaker seems like it can do the job too.

nevets1219
+1  A: 

Very surprising that it has a 2k foot print!! How about making ConcurrentHashMap's concurrency setting lower (e.g. 2-3), and optimizing its initial size (= make smaller).

I don't know where that memory consumption is coming from, but maybe it has something to do with maintaining striped locks. If you lower the concurrency setting, it will have less.

If you want good performance with out-of-the-box thread safety, ConcurrentHashMap is really nice.

Enno Shioji
Doh! Didn't occur to me you could adjust ConcurrentHashMap's settings.
Daniel Winterstein