views:

1453

answers:

7

I have a HashMap with millions of entries.

Need to retrieve all entries whose keys match a specific set of criteria (in this case, each key is an object with two integer properties; I need to retrieve all keys where each of these integers fall within a specified range).

What is the fastest, most efficient way to iterate through all such keys?

UPDATE: In this particular case, though I didn't specify it up front, the first integer in the key has a natural precedence over the second integer.

A: 

There likely won't be a faster solution than something like:

for (final KeyObj key : map.keySet()) {
    // do work
}
Hank Gay
A: 

If a TreeSet won't work for some reason, the standard way to iterate is with the entry set.

for (Map.Entry<MyKeyType, MyValueType> entry : myMap.entrySet()) {
    MyKeyType key = entry.getKey();
    if (isValid(key)) {
        // do whatever
        validList.add(entry.getValue());
    }
}

This way, you don't have to do an extra myMap.get(key) call for valid keys.

Michael Myers
+1  A: 

You can't do it without iterating through the entire keySet.

You can use a TreeMap with a sort criteria that will sort by some combination of the two integer properties if you're sure that you won't have other entries that have the same value of those integer properties, and then you can find the first match directly and then just iterate from there to the first non-match. But it seems unlikely you can achieve those conditions.

Because collections have pretty low overhead (everything is stored by reference), I would consider making two sorted collections, possibly TreeSets, one sorted by the first property and one sorted by the second, and then pick out all the values that meet the criteria from both collections and union them together.

Paul Tomblin
+7  A: 

A HashMap is not an efficient data structure for finding keys that lie within a certain range. Generally the only keys you can find efficiently in a hash map are keys with the same hash as what you have (i.e. equal keys).

For finding keys that lie within a certain range, you are better off using a SortedMap of some kind, such as a TreeMap, which can then be viewed with the SortedMap.subMap(low, high) view method.

As for finding a key based on two keys, that is even more difficult. Your best bet is probably to iterate over the subMap of the range of the first integer, and then to check for each one if the second integer falls within the specified range. This at least limits the scan to the keys which have one of the integers within the range. Try to sort the map based on the integer that has a more natural distribution of values over the possible ranges you might have to search for.

Avi
That sounds like a good way to go. Thanks!
DanM
Depending on the likely values you will be retrieving it may be more efficient to sort the map by a function of the integers (such as sum), and search between the lowest and highest possible value.
DJClayworth
A: 

You may want to consider some sort of SQL DB, maybe an in-memory one like Derby or H2. Much depends on how important this is and how important that it be fast. Then you could do this in SQL and let the engine do all the optimization work.

sblundy
+3  A: 

Here's a solution using TreeMap:

public static void main(String[] args) {
 Comparator<Foo> fooComparator = new Comparator<Foo>() {
  @Override
  public int compare(Foo o1, Foo o2) {
   return o1.compareTo(o2);
  }
 };

 TreeMap<Foo, String> map = new TreeMap<Foo, String>(fooComparator);

 map.put(new Foo(1, 4), "");
 map.put(new Foo(1, 3), "");
 map.put(new Foo(2, 4), "");
 map.put(new Foo(3, 4), "");
 map.put(new Foo(8, 10), "");
 map.put(new Foo(8, 17), "");
 map.put(new Foo(10, 10), "");

 int a = 2;
 int b = 5;

 for (Foo f : getKeysInRange(map, a, b)) {
  System.out.println(f);
 }
}

public static List<Foo> getKeysInRange(TreeMap<Foo, String> map, int low, int high) {
 Foo key1 = new Foo(low, low);
 Foo key2 = new Foo(high, high);

 Foo fromKey = map.ceilingKey(key1);
 Foo toKey = map.floorKey(key2);

 if (fromKey != null && toKey != null && fromKey.compareTo(toKey) < 0)
  return new ArrayList<Foo>(map.subMap(fromKey, true, toKey, true).keySet());
 return new ArrayList<Foo>();
}

public static class Foo implements Comparable<Foo> {
 private int i;
 private int j;

 private Foo(int i, int j) {
  super();
  this.i = i;
  this.j = j;
 }

 public int min() {
  if (i < j)
   return i;
  else
   return j;
 }

 public int max() {
  if (i > j)
   return i;
  else
   return j;
 }

 @Override
 public String toString() {
  return "I=" + i + "J=" + j;
 }

 @Override
 public int compareTo(Foo o) {
  if (this.min() > o.min()) {
   return 1;
  } else if (this.min() < o.min())
   return -1;
  else {
   if (this.max() > o.max())
    return 1;
   else if (this.max() < o.max())
    return -1;
   else
    return 0;
  }
 }
}
bruno conde
This doesn't solve the problem, since Foo.compareTo() compares the minimum of the two Foo's first, rather than simply comparing this.i to o.i and then comparing this.j to o.j.
Avi
+1  A: 

The solution provided by bruno conde is a good start. However, the way I read the original question is that the key Object contains two integers and that the question was regarding the fastest way to retrieve all the key/value pairs that match one range for the first integer and match a second range for the second integer. The bruno solution is assuming that keys have a natural order where the first integer always has precedence over the second integer. It also assumes there is only one range.

For this more general case, I would: insert key/values into a TreeMap using a comparator that favors integer1 insert the same key/values into a second TreeMap using a comparator that favors integer2

You can then use subMap() on each TreeMap using the range to get an ordered view of the underlying TreeMap. You can then create a new result TreeSet based on the intersection (retainAll()) of the keySet() of these subMaps.

Gary