views:

169

answers:

3

Hello,

which is the best way to write a bidimensional hashmap efficiently in Java? Just to give an example of what I'm talking about: I'm developing some algorithms related to collective intelligence, these algorithms works by calculating correlation between pairs of elements..

Without caching these values, since they are calculated on same pairs multiple times, performance are horrible.. (algorithms can be O(n^2) but maybe O(n^3) so I was thinking about using an HashMap to store values to be used multiple times.

Which is the most efficient way to implement such a data structure in Java? It should be possble to cache and remove a value generated by a pair of elements with O(1), but using an explicit class seems too heavy anyway.

If Java will turn out to be not enough I'll have to switch to C/C++, so any idea related to these languages are welcome too.

Thanks

A: 

Google Collections supports bi-directional hashmaps, see BiMap.

(BTW, Google Collections seems to be gaining more mindshare over Apache Collections.)

Update: Note @danben and @sateesh's clarification. A BiMap would be fine if you need to get a y given an x, or an x given a y. But it sounds like you really want to look up an (x, y) point and get back a value that contains your cached information. In that case, go with @danben's suggestion.

Jim Ferrans
+1 to Google Collections gaining "market share". Type collections seemed to have been missing from Apache Commons for too long.
Matt
That's all well and good, but if you read the post, a `BiMap` will not solve his problem.
danben
@danben, I suspect you're right and that the OP wanted to ask a different question.
Jim Ferrans
@Jim Ferrans - he mostly asked his question correctly. Of the two occurrences of the word "bidimensional", only one was misspelled as "bidirectional", and the context of the post made it pretty clear.
danben
Yes, I meant bidimensional, not bidirectional.. sorry for typo
Jack
+4  A: 

The easiest way to do this is to define a Pair class. It should be immutable (hash keys should not change), and hashCode() should be consistent with equals.

Something like (method implementations omitted):

public class Pair() {
  int a, b;

  public Pair(int a, int b);

  public int getA();

  public int getB();

  public boolean equals(Object obj);

  public int hashCode();
}

Notes:

  • If you don't want ints, sub in whatever type you want, or make your Pair class generic if you want it to be flexible.

  • It would be up to you whether (x, y) == (y,x).

With this in hand, you can have a HashMap<Pair, SomethingElse> as your cache.

danben
I already tried this approach but heap grows too much, I would like to avoid allocating another object for every pair of items I insert into the hashmap.. since they can be millions
Jack
A: 

I partially solved the problem by concatenating hashcodes of both items using something like this:

private long computeKey(Object o1, Object o2)
{
    int h1 = o1.hashCode();
    int h2 = o2.hashCode();

    if (h1 < h2)
    {
        int swap = h1;
        h1 = h2;
        h2 = swap;
    }

    return ((long)h1) << 32 | h2;
}

I still have to figure out which is the most efficient way to store all the elements already cached with a specified one to remove them when the algorithm don't need the item anymore, just to avoid filling of the HashMap with a waste of item. That's because the kind of algorithm merges two items at every iteration removing them from used ones but adding the new generated item.

Jack