views:

145

answers:

5

I use PriorityQueue for partial sorting of some data. In particular, this is the code:

Collection<Data> data = ...;
PriorityQueue<Data> queue = new PriorityQueue<Data>(data.size(), dataComparator);
queue.addAll(data);
// iterate over queue with remove() until we have as much data as we need or until queue is empty

Unfortunately, when data collection is empty, the code fails, because PriorityQueue cannot be passed zero as initialCapacity. What are reasons behind this design decision? Why can't there be an 0-sized PriorityQueue?

UPD: I know how to work around this. I'd like to know why doesn't PriorityQueue include this max(1, n) code inside it - are there any reasons or is it just a bad API design?

+1  A: 

If you think want capacity means, it means preparing the queue to be able to store at least X items without requiring additional internal memory allocations. So if you expected a queue to contain a maximum of 100 items you might set capacity to 100 in the constructor to prepare for that.

What is the point of telling a queue to expect NO items? It doesn't make sense to allow 0 in the first place so the minimum value is 1 and the constructor throws an exception if you pass 0 (or less).

locka
Your example sounds beautiful but how should the developer know that it will be 100 items when he writes the code? He normally thinks of "n", and n being = 0 is quite a normal situation.
chiccodoro
@chiccodoro: Often you can derive some approximation of n from other data. And if you can't, it doesn't matter. This is just a hint. If it is too small, the queue will resize itself.
musiKk
@chiccodoro, 100 items is mereley the capacity, not the number of items that will be stored. Don't apply your experience to everyone else. Many problems have a finite domain of inputs, so in many cases a developer has a very good idea of what kind of capacity they might need. Also, n=0 might be a normal situation, but its is a situation where you wouldn't be using the container at all so why make a big fuss about it?
mikerobi
@mikerobi: You're right in saying you wouldn't be using the container at all. The thing is, if your n can be anything from 0...xyz, it's more comfortable to write "n" instead of things like "Math.min(n, 1)".
chiccodoro
@chiccodoro, the capacity is just a hint - "i expect this many items tops". If that value is exceeded the queue will still allocate extra space. If you can't estimate the upper bound with accuracy, make a rough guess. If you can't do that, don't bother setting a hint and use the default. Chances are it doesn't make a huge difference in performance but it does save allocating and copying data from one array to another every so often.
locka
@locka, in fact, I know exactly how many items the queue will store - as many as the collection I'm going to sort. But it just happens that sometimes this collection is empty.
Shooshpanchick
A: 

Why would you want to create a Collection but then specify that it should be empty?

You can just instantiate the PriorityQueue with Math.max(1, data.size())

Jon Freedman
I do several transformations on initial collection and I don't want to write separate code for the case when it is empty. The less code, the better.
Shooshpanchick
+1  A: 

There's another constructor you can use to avoid the IllegalArgumentException.

public PriorityQueue(Collection<? extends E> c)

The priority queue has an initial capacity of 110% of the size of the specified collection or 1 if the collection is empty.

If you need the custom comparator, you can make the call as follows (from Jon's answer):

PriorityQueue<Data> queue = new PriorityQueue<Data>(Math.max(data.size(), 1), dataComparator);

As others have said, it doesn't make much sense to have a queue with no capacity. Since you have to set the minimum capacity somewhere (you certainly don't want to allow negative values), setting it at 1 seems reasonable.

Bill the Lizard
I assume he uses the other constructor because that is the only way to pass a special Comparator.
Péter Török
@Péter Török: Ah, that's right. That's a comparator, not a Collection. I misread. I edited my answer to reflect that.
Bill the Lizard
+4  A: 

From the source code for PriorityQueue:

/**
* Priority queue represented as a balanced binary heap: the two children
* of queue[n] are queue[2*n] and queue[2*n + 1]. The priority queue is
* ordered by comparator, or by the elements' natural ordering, if
* comparator is null: For each node n in the heap and each descendant d
* of n, n <= d.
*
* The element with the lowest value is in queue[1], assuming the queue is
* nonempty. (A one-based array is used in preference to the traditional
* zero-based array to simplify parent and child calculations.)
*
* queue.length must be >= 2, even if size == 0.
*/

Read more: http://kickjava.com/src/java/util/PriorityQueue.java.htm#ixzz0yBp7ocHR

stark
It describes how the queue works internally, and I'm asking about its API.
Shooshpanchick
-1 for a blind comment dump
Erick Robertson
The API is an implementation of the PriorityQueue data structure, so in essence, @stark is correct.
The Elite Gentleman
This still doesn't answer the question.
Erick Robertson
The Elite Gentleman: regardless of what the implementation is, constructor could allow values 0..n just for the sake of homogenity. Actually, the ideal fix for this would be to allow creating a PriorityQueue(collection, constructor).
Shooshpanchick
I agree this doesn't actually answer the question, and it is a comment dump. In my defence it does actually give some details of the underlying data structure (an array modelling a balanced binary heap), and hints that the reason for this restriction is to simplify the implementation.
stark
if it doesn't answer the question, why is this the most voted up answer? whyyyy?
SB
+6  A: 

Why do you want to use a queue? A queue is a data structure made for the case where you have a "producer" which enqueues items, and a "consumer" which dequeues them. A priority queue orders the enqueued items using a tree structure. A buffer is needed for a producer being able to enqueue, so initialCapacity = 0 makes no sense.

In your case you never enqueue anything, you just process data from a collection you already have. Why create a new data structure for it? You could just use

for (Data item : Collections.sort(data, dataComparator)) {
    // ...
}

or, following Daniel's comment, use the Selection Algorithm so you can profit from your situation that you actually only need a subset of your items.

chiccodoro
I use it for /partial/ sort. I may have a lot of data elements, but I need to select only top 10 or so, no need to sort the rest.
Shooshpanchick
You have to sort all of them in order to get the top 10. (+1 for understanding the problem)
Erick Robertson
Getting top k elements of a set can easily be done in O(n) time, while sorting, in general, requires (On log n). CLRS commnets on it, and wikipedia gives extremely good reference on the subject: http://en.wikipedia.org/wiki/Selection_algorithm. For paralell/distributed processing, there are Map/Reduce techniques: http://pl.postech.ac.kr/~myson/pastwork/dr.html
Daniel Ribeiro
@Daniel: This is a very good hint! However If Shooshpanchick actually doesn't need exactly 10 elements but rather after each item decide on whether he needs another one, I fear that this won't work in O(n) anymore, unfortunately. Am I right?
chiccodoro
The `PriorityQueue` javadoc explicitly says that `remove()` is `O(log(n))`. This means that regardless of whether I know how many elements I need, retrieving first k will be O(k*log(n)) which is better than O(n*log(n)) for sorting.
Shooshpanchick
How much time does the insert take? Because you still need to sort them all on insert. So inserting all elements will be O(n*log(n)) and then removing the first ten will be O(k*log(n)). O((n+k)*log(n)) is slower than O(n*log(n)).
Erick Robertson
@chiccodoro: Yes. If you use constant time selection algorithms for such scenario, you can end up having a O(n^2) algorithm. It is in fact the very idea behind selection sort. The question is pragmatic: how does your k compare with your n. If k = O(lg n), sorting it all will be just as good. In practice, most systems that use top k algorithms actually have k up to 2000 or 10000, and such numbers are much lower than n, which usually is in the millions or billions scale.The only best practice is using to use your intelligence.
Daniel Ribeiro
@Daniel: Thanks! I have incorporated your hint in my answer.
chiccodoro