views:

563

answers:

6

Hi folks,

I am calculating a large number of possible resulting combinations of an algortihm. To sort this combinations I rate them with a double value und store them in PriorityQueue. Currently, there are about 200k items in that queue which is pretty much memory intesive. Acutally, I only need lets say the best 1000 or 100 of all items in the list. So I just started to ask myself if there is a way to have a priority queue with a fixed size in Java. I should behave like this: Is the item better than one of the allready stored? If yes, insert it to the according position and throw the element with the least rating away.

Does anyone have an idea? Thanks very much again!

Marco

A: 

A better approach would be to more tightly moderate what goes on the queue, removing and appending to it as the program runs. It sounds like there would be some room to exclude some the items before you add them on the queue. It would be simpler than reinventing the wheel so to speak.

Gordon
+1  A: 

It seems natural to just keep the top 1000 each time you add an item, but the PriorityQueue doesn't offer anything to achieve that gracefully. Maybe you can, instead of using a PriorityQueue, do something like this in a method:

List<Double> list = new ArrayList<Double>();
...
list.add(newOutput);
Collections.sort(list);
list = list.subList(0, 1000);
Wesho
also using a TreeMap, you have the highest value readily available and you can avoid insertions altogether if the current result is greater than that, removing the last key and inserting the new value otherwise
Lorenzo Boccaccia
@Lorenzo, `Map` isn't good as it will not allow two combinations having the same rating.
gustafc
this approach does not have the performance benefits of black red tree implementation and performance killer
nimcap
A: 

Use SortedSet:

SortedSet<Item> items = new TreeSet<Item>(new Comparator<Item>(...));
...
void addItem(Item newItem) {
    if (items.size() > 100) {
         Item lowest = items.first();
         if (newItem.greaterThan(lowest)) {
             items.remove(lowest);
         }
    }

    items.add(newItem);   
}
Victor Sorokin
A set will not allow several `Item`s to have the same rating.
gustafc
Depends on how you define Comparator for Set -- it can consider not only rating, but some unique field of Item, like id.
Victor Sorokin
A: 

Just poll() the queue if its least element is less than (in your case, has worse rating than) the current element.

static <V extends Comparable<? super V>> 
PriorityQueue<V> nbest(int n, Iterable<V> valueGenerator) {
    PriorityQueue<V> values = new PriorityQueue<V>();
    for (V value : valueGenerator) {
        if (values.size() == n && value.compareTo(values.peek()) > 0)
            values.poll(); // remove least element, current is better
        if (values.size() < n) // we removed one or haven't filled up, so add
            values.add(value);
    }
    return values;
}

This assumes that you have some sort of combination class that implements Comparable that compares combinations on their rating.

Edit: Just to clarify, the Iterable in my example doesn't need to be pre-populated. For example, here's an Iterable<Integer> that will give you all natural numbers an int can represent:

Iterable<Integer> naturals = new Iterable<Integer>() {
 public Iterator<Integer> iterator() {
  return new Iterator<Integer>() {
   int current = 0;
   @Override
   public boolean hasNext() {
    return current >= 0;
   }
   @Override
   public Integer next() {
    return current++;
   }
   @Override
   public void remove() {
    throw new UnsupportedOperationException();
   }
  };
 }
};

Memory consumption is very modest, as you can see - for over 2 billion values, you need two objects (the Iterable and the Iterator) plus one int.

You can of course rather easily adapt my code so it doesn't use an Iterable - I just used it because it's an elegant way to represent a sequence (also, I've been doing too much Python and C# ☺).

gustafc
Does this assume you have all the items in `valueGenerator` already?
Wesho
I think one of the goals of the OP is to avoid accumulating so many items in an `Iterable` in the first place. Furthermore, if the higher the ranking the better the algorithm, then `peek` is not what you want.
Wesho
No, you don't need to have them all available. An iterator can generate values on the fly in its `next()` method.
gustafc
And why wouldn't `peek()` do the trick? It returns the least element, and if the current element is better than the least element, I throw the least element away and add the current. I tested the code, it works.
gustafc
Whether `peek` will work or not depends on the `Comparator`. But if the higher rankings are the ones to keep, then your "keep" list would be something like this: [10.0, 9.9, 9.5, ...], assuming you don't add a `Comparator` which does the reverse of the default `Double` `Comparable` behavior, `peek` will not work.
Wesho
If you doubt, just try the code - it works. Quoting the JavaDocs: "The head of this queue is the *least* element with respect to the specified ordering. [...] The queue retrieval operations `poll`, `remove`, `peek`, and `element` access the element at the head of the queue." As I say in the post: I assume that whatever is used to represent a combination implements `Comparable` in a way that considers a lower-rated combination as "less than" a better-rated one. If it doesn't and can't, I leave it as an excercise for the reader to modify my example so that it uses a custom comparator.
gustafc
Yep u're right, the head is indeed the least element. For some reason I thought it was the other way around.
Wesho
A: 
que.add(d);
if (que.size() > YOUR_LIMIT)
     que.poll();

or did I missunderstand your question?

edit: forgot to mention that for this to work you probably have to invert your comparTo function since it will throw away the one with highest priority each cycle. (if a is "better" b compare (a, b) should return a positvie number.

example to keep the biggest numbers use something like this:

public int compare(Double first, Double second) {
      // keep the biggest values
      return first > second ? 1 : -1;
     }
getekha
A: 

There is a fixed size priority queue in Apache Lucene: http://lucene.apache.org/java/2%5F4%5F1/api/org/apache/lucene/util/PriorityQueue.html

It has excellent performance based on my tests.

Vladimir Giverts