views:

144

answers:

4

I am looking for an efficient data structure to represent a priority list. Specifically I need to assign a priority to a set of items and return only the top scoring items. I have looked into priority queues which operate on heaps, but they don't seem to really suit my needs. They will reorganize the heap structure as soon as I will poll the top rating item from the queue.

The simplest solution would of course be a linked list, which in the worst case would take quite long for the insertion operation.

Does anyone have a better solution?

+3  A: 

Hmm. Skip lists? They should have O(log n) insertion (as heap-based queue) but getting top element should be O(1) [including removing it]. They could be even implemented using lock-free algorithm.

Maciej Piechotka
Heaps are better than skip lists if you use them correctly. Use a min-heap of x elements, when you need the top x. You don't have to construct a tree/heap of all the n. Just x.
Moron
Sorry - my fault I misread the text (I understood he wants fast poll even at cost of slow add).
Maciej Piechotka
+1  A: 

The JDK has a built-in pqueue class (java.util.PriorityQueue) which is based on a heap algorithm.

Sorry, I only just saw the bit about heaps not fitting your needs. Can you explain why? You can write a custom comparator (or make your items comparable) and the PriorityQueue will order your items appropriately.

dty
As far as I understand him he finds getNext in O(log n) not acceptable.
Maciej Piechotka
The problem seems to be that ladi wants to be able to get the x first items without removing any of them. That's not typically an operation supported by priority lists.
Michael Borgwardt
I would like rate some items and only get the top n scoring items. So I was wandering if there are any datastructures that only hold the top scoring items but offer a list interface. That means I could go through the list of the top scoring items sequentially.I could of course use a priority queue based on a heap algorithm which has O(log n) insertion and O(n) retrival, get the top scoring elements and add them to a list. I was just curious if there exists something better than that.
ladi
@ladi: Not sure what you mean by O(n) retrieval -- extracting the top item from a heap is O(log n). Only if you need to find a specific (non-minimal) element is retrieval O(n). If all you can do is compare 2 items and determine which is bigger, then nothing is asymptotically faster than a heap for the problem you're looking at.
j_random_hacker
+4  A: 

If you need only the k top items and you never need to look a the others, you can use a simple linked list or array storing only the current top k items, plus a number (the worst score of the elements in the list).

In the Add() operation you simply compare the item with the worst value in the list and, if better, you swap the current worst with the added item. This takes O(k) time in the worst case for insertion because you need to find the element that has currently the worst score. The the average case, however, is O(1), since, as you add better elements to the list, the probability of having to do a swap tends to 0 (that is, you're not actually adding any items).

So if you generate elements at random, your performance is likely to be very good. Even if you generate ordered items (worst case), it might be fast enough for your value of k.

Mau
nice idea......
Dave
Instead of a list, if you use min-heap (see my answer), the worst case time is O(logK). The rest is similar. In fact using min-heaps like is quite a standard method for this problem! (When x is small compared to n).
Moron
+2  A: 

Heaps seem very suitable, and it seems like you are going about it wrongly.

Say you wanted the top x elements (how does this x compare to n, btw?)

What you are doing is putting all into a max-heap and getting the top x.

I suggest instead, you use a min-heap of exactly x elements.

First x elements you insert into heap.

Next incoming element, you compare against the min which can be done very quickly (O(1) time) in the heap. If smaller, you just ignore the incoming element.

If incoming element is larger than min, then you increase the min to the incoming element and sift it down in the heap. This should be logx time at worst.

Once done (in nlogx time), you can retrieve the elements from the heap in sorted order in O(xlogx) time.

Depending on how your data is (and how small x is), using this min-heap solution can be really fast.


If you really really want the inserts to be super-fast and don't care much about the retrieval then you can also do the following.

Insert the elements into a vector (array with amortized O(1) insert time) in the order they come.

The use the Selection algorithm to find the xth largest element (in O(n) time, but the constants might be big). Say that number is S.

Now walk the array comparing each element with S and select the ones as large as S.

If x is reasonably sized and comparable to n (like n/2 or something) this might work out fine, but if x is small compared to n, I would suggest go with the min-heap.

Moron
I didn't think of it this way. This looks very promising.
ladi