ansaurus

Question

Is there a parallel processing implementation of HashMap available to Java? Is it even possible?

Answer 1

A:

HashMaps have constant lookup times. Not sure how you could really speed that up since trying to have multiple threads execute the hashing function will only cause it to go slower.

gshauger 2009-09-04 23:12:56

lookup times depend heavily on the hash functions: from an algorithmic and a implementation viewpoint.

ebo 2009-09-04 23:51:27

Answer 2

+3 A:

Sounds like you should profile. You could have a high collision rate. You could also try using a lower loadFactor in the HashMap - to reduce collision probability.

Also, if the hashCodes are precomputed, then there is not much work for get() to do except mod and a few equals(). How fast is equals() on your key objects?

daveb 2009-09-04 23:23:16

equals should be fast, I'm doing equals checks between 5 numbers referenced via getters.

Elijah 2009-09-04 23:33:53

You could always add the classic optimization: if (this == other) { return true; }But If that's not the bottleneck - don't bother

daveb 2009-09-04 23:37:22

Playing about with the load factor is unlikely to help (since 1.4.1ish the hash is rehashed before finding the bucket, so high collision counts tend to indicate that the whole 32-bits of hash value matched). If you have a high number of collisions, the hash function needs work.

Tom Hawtin - tackline 2009-09-05 05:54:33

Answer 3

+1 A:

From the HashMap documentation (I've changed the emphasis):

Note that this implementation is not synchronized. If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

Since your HashMap is never modified you can safely let multiple threads read from it. Implementing locking is not necessary. (The same is true for any case where threads share access to immutable data; in general the simplest way to achieve thread safety is not to share any writable memory)

To ensure that your code doesn't modify the map by accident, I would wrap the map with Collections.unmodifiableMap immediately after its construction. Don't let any references to the original modifiable map linger.

Wim Coenen 2009-09-05 00:06:54

I don't think this is what Elijah wants. He wants get() to be multithreaded. In other words, he is afraid that the get() calls will be too slow, and he is hoping that by having more than one thread working on the get, it will return faster.

Peter Recore 2009-09-05 04:28:32

Also consider com.google.common.collect.ImmutableMap. It's unmodifiable by design so it doesn't need a proxy (the cost of which may be significant in this example) to protect it from modifications.

finnw 2009-09-05 12:14:12

Answer 4

+6 A:

So your hash codes are precomputed and the equals function is fast - your hashmap gets should be very fast in this case.

Have you profiled your application to prove that the hashmap gets are indeed the bottleneck?

If you have multiple application threads, they should all be able to perform their own gets from the hashmap at the same time - since you aren't modifying the map, you don't need to externally synchronize the gets. Is the application that uses the hashmap sufficiently threaded to be able to make use of all your hardware threads?

Since the contents of the hash table are immutable, it might be worth looking into perfect hashing - with a perfect hash function, you should never have collisions or need chaining in the hash table, which may improve performance. I don't know of a java implementation off hand, but in know in C/C++, there is gperf

Chi 2009-09-05 00:37:58

+1 for profiling to avoid "premature optimization" and another +1 if I could for making sure that the hash function is creating well distributed, non colliding values, not just in theory or with random data, but also with production data.

Peter Recore 2009-09-05 04:34:01

Answer 5

+2 A:

To answer your question: yes, absolutely. AS LONG AS YOU AREN'T WRITING TO IT.

You're going to have to make it by hand, and it's going to be a little tricky. Before trying that, have you optimized as much as possible?

In C++, check out Google's dense hash map class in their sparsehash package.

In Java, if you're mapping with a primitive key, use Trove or Colt maps.

That said, here's a start for your parallel hash map: if you choose n hash functions and spawn n threads to search down each path (probing/chaining at each of the n insertion points) you'll get a decent speedup. Be careful because there's a high cost to creating threads, so spawn the threads on construction and then block them until they're needed.

Hopefully the cost of locking won't be higher than the cost of lookup, but that's up to you to experiment with.

2009-09-05 00:47:38

Answer 6

+1 A:

You mentioned this in a comment:

I'm doing equals checks between 5 numbers referenced

From this I infer that your hash computation is also doing some calculations with these 5 numbers. For good HashMap performance, the results of this computation should be randomly dispersed over all possible int values. From the HashMap documentation:

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.

In other words, look-up times should remain constant regardless of the element count, if you have a good hash function. Example of a good hashCode() function for a class that stores three numbers (using a prime number to reduce the chance of the XOR yielding zero as suggested by comment):

return this.a.hashCode() ^ (31 * (this.b.hashCode() ^ (31 * this.c.hashCode())));

Example of a bad hashCode function:

return (this.a + this.b + this.c);

Wim Coenen 2009-09-05 00:54:37

If that ^ hash function permutations end up with the same code. Repeated values disapper completely. Multiply by a prime and ^ is the idiomatic solution.

Tom Hawtin - tackline 2009-09-05 05:58:32

Answer 7

A:

I think you need evidence that the get() method on the HashMap is where you delay is. I think this is highly unlikely. Put a loop around your get() method to make it repeat 1,000 times and your application probably won't slow down at all. Then you'll know that the delay is elsewhere.

Julian Cochran 2010-01-11 12:33:35

ansaurus

tags:

views:

answers:

Is there a parallel processing implementation of HashMap available to Java? Is it even possible?

Searching for the magic ParallelHashMap class

related questions