views:

119

answers:

7

I have a large map of String->Integer and I want to find the highest 5 values in the map. My current approach involves translating the map into an array list of pair(key, value) object and then sorting using Collections.sort() before taking the first 5. It is possible for a key to have its value updated during the course of operation.

I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.

Could I have some suggestions/alternatives on optimizing this please? Am happy to consider different data structures if there is benefit.

Thanks!

+4  A: 

Well, to find the highest 5 values in a Map, you can do that in O(n) time where any sort is slower than that.

The easiest way is to simply do a for loop through the entry set of the Map.

for (Entry<String, Integer> entry: map.entrySet()) {
    if (entry.getValue() > smallestMaxSoFar) 
        updateListOfMaximums();
}
jjnguy
+1  A: 

I think this approach is acceptable single threaded, but if I had multiple threads all triggering the transpose and sort frequently it doesn't seem very efficient. The alternative seems to be to maintain a separate list of the highest 5 entries and keep it updated when relevant operations on the map take place.

There is an approach in between that you can take as well. When a thread requests a "sorted view" of the map, create a copy of the map and then handle the sorting on that.

public List<Integer> getMaxFive() {
    Map<String, Integer> copy = null;
    synchronized(lockObject) {
        copy = new HashMap<String, Integer>(originalMap);
    }

    //sort the copy as usual
    return list;
}

Ideally if you have some state (such as this map) accessed by multiple threads, you are encapsulating the state behind some other class so that each thread is not updating the map directly.

matt b
A: 

Please try another data structure. Suppose there's a class named MyClass which its attributes are key (String) and value (int). MyClass, of course, needs to implement Comparable interface. Another approach is to create a class named MyClassComparator which extends Comparator.

The compareTo (no matter where it is) method should be defined like this: compareTo(parameters){ return value2 - value1; // descending }

The rest is easy. Using List and invoking Collections.sort(parameters) method will do the sorting part.

I don't know what sorting algorithm Collections.sort(parameters) uses. But if you feel that some data may come over time, you will need an insertion sort. Since it's good for a data that nearly sorted and it's online.

jancrot
Another function of the API has the need to retrieve a key quickly so swapping for a Collection instead of a Map would hurt that performance unacceptably as list is large. However, your idea is sound - there is no reason why I couldn't map(key -> composite(key, value)) where the composite implements comparable. I could then just say Collections.sort(map.values()). Unfortunately this still has the performance impact when you introduce multiple threads as each thread would could a merge sort (O(n log n)).
Scruffers
+3  A: 

You could use two Maps:

// Map name to value
Map<String, Integer> byName

// Maps value to names
NavigableMap<Integer, Collection<String>> byValue

and make sure to always keep them in sync (possibly wrap both in another class which is responsible for put, get, etc). For the highest values use byValue.navigableKeySet().descendingIterator().

Steve Kuo
I really like this, but from memory wouldn't doing this require that all values a unique. This is unlikely to be the case in my domain, so the byValue map would probably corrupt.
Scruffers
Good point, I modified `byValue` to hold all names for a given value.
Steve Kuo
A: 

If modifications are rare, I'd implement some SortedByValHashMap<K,V> extends HashMap <K,V>, similar to LinkedHashMap) that keeps the entries ordered by value.

leonbloy
A: 

I would create a method like:

private static int[] getMaxFromMap(Map<String, Integer> map, int qty) {
    int[] max = new int[qty];
    for (int a=0; a<qty; a++) {
        max[a] = Collections.max(map.values());
        map.values().removeAll(Collections.singleton(max[a]));
        if (map.size() == 0)
            break;
    }
    return max;
}

Taking advantage of Collections.max() and Collections.singleton()

Garis Suero
This is O(n), but in practice performs quite slowly compared to other methods.
Chris
+1  A: 

There are two ways of doing that easily:

  1. Put the map into a heap structure and retrive the n elements you want from it.
  2. Iterate through the map and update a list of n highest values using each entry.

If you want to retrive an unknown or a large number of highest values the first method is the way to go. If you have a fixed small amount of values to retrieve, the second might be easier to understand for some programmers. Personally, I prefer the first method.

Chris