views:

196

answers:

7

What's faster: inserting into a priority queue, or sorting retrospectively?

I am generating some items that I need to be sorted at the end. I was wondering, what is faster in terms of complexity: inserting them directly in a priority_queue or a similar data structure, or using a sort algorithm at end?

+1  A: 

I think that the insertion is more efficient in almost all cases where you are generating the data (i.e. don't already have it in a list).

A priority queue is not your only option for insertion as you go. As mentioned in other answers a binary tree (or related RB-tree) is equally efficient.

I would also check how the priority queue is implemented - many are based on b-trees already but a few implementations are not very good at extracting the elements (they essentially go through the entire queue and look for the highest priority).

Elemental
+9  A: 

Inserting n items into a priority queue will have asymptotic complexity O(n log n) so in terms of complexity, it’s not more efficient than using sort once, at the end.

Whether it’s more efficient in practice really depends. You need to test. In fact, in practice, even continued insertion into a linear array (as in insertion sort, without building a heap) may be the most efficient, even though asymptotically it has worse runtime.

Konrad Rudolph
+1  A: 

A priority queue is usually implemented as a heap. Sorting using a heap is on average slower than quicksort, except that quicksort has a worse worst case performance. Also heaps are relatively heavy data structures, so there's more overhead.

I'd reccomend sort at end.

jdv
Relatively heavy? No, it’s a simple array, and the sift-down and bubble-up operations are likewise simple. The reason why quicksort is faster on average is rather related to the fact that heapsort has to relocate each element at least twice (it works in two passes). However, this isn’t really the case here since we do online sorting, so the relative runtimes of heapsort and quicksort in this context have to be reevaluated carefully.
Konrad Rudolph
+3  A: 

Depends on the data, but I generally find InsertSort to be faster.

I had a related question, and I found in the end the bottleneck was just that I was doing a deffered sort (Only when I ended up needed it) and on a large amount of items, I usually had the worst-case-scenario for my QuickSort (already in order), So I used an insert sort

http://stackoverflow.com/questions/2952407/sorting-1000-2000-elements-with-many-cache-misses

So analyze your data!

Soylent Graham
+1  A: 

Why not use a binary search tree? Then the elements are sorted at all times and the insertion costs are equal to the priority queue. Read about RedBlack balanced trees here

midtiby
I think priority queues will trivially be more efficient than self-balancing binary tries since the latter don’t offer the same cache-friendly behaviour and rely on heap memory allocation.
Konrad Rudolph
@Konrad: that appears to be the result of my simplistic test. I was actually expecting the multiset to be awful, precisely because of the memory allocation, but it's not *that* bad, only five times slower than `std::sort`.
Steve Jessop
+2  A: 

To your first question (which is faster): it depends. Just test it. Assuming you want the final result in a vector, the alternatives might look something like this:

#include <iostream>
#include <vector>
#include <queue>
#include <cstdlib>
#include <functional>
#include <algorithm>
#include <iterator>

#ifndef NUM
    #define NUM 10
#endif

int main() {
    std::srand(1038749);
    std::vector<int> res;

    #ifdef USE_VECTOR
        for (int i = 0; i < NUM; ++i) {
            res.push_back(std::rand());
        }
        std::sort(res.begin(), res.end(), std::greater<int>());
    #else
        std::priority_queue<int> q;
        for (int i = 0; i < NUM; ++i) {
            q.push(std::rand());
        }
        res.resize(q.size());
        for (int i = 0; i < NUM; ++i) {
            res[i] = q.top();
            q.pop();
        }
    #endif
    #if NUM <= 10
        std::copy(res.begin(), res.end(), std::ostream_iterator<int>(std::cout,"\n"));
    #endif
}

$ g++     sortspeed.cpp   -o sortspeed -DNUM=10000000 && time ./sortspeed

real    0m20.719s
user    0m20.561s
sys     0m0.077s

$ g++     sortspeed.cpp   -o sortspeed -DUSE_VECTOR -DNUM=10000000 && time ./sortspeed

real    0m5.828s
user    0m5.733s
sys     0m0.108s

So, std::sort beats std::priority_queue, in this case. But maybe you have a better or worse std:sort, and maybe you have a better or worse implementation of a heap. Or if not better or worse, just more or less suited to your exact usage, which is different from my invented usage: "create a sorted vector containing the values".

I can say with a lot of confidence that random data won't hit the worst case of std::sort, so in a sense this test might flatter it. But for a good implementation of std::sort, its worst case will be very difficult to construct, and might not actually be all that bad anyway.

Edit: I added use of a multiset, since some people have suggested a tree:

    #elif defined(USE_SET)
        std::multiset<int,std::greater<int> > s;
        for (int i = 0; i < NUM; ++i) {
            s.insert(std::rand());
        }
        res.resize(s.size());
        int j = 0;
        for (std::multiset<int>::iterator i = s.begin(); i != s.end(); ++i, ++j) {
            res[j] = *i;
        }
    #else

$ g++     sortspeed.cpp   -o sortspeed -DUSE_SET -DNUM=10000000 && time ./sortspeed

real    0m26.656s
user    0m26.530s
sys     0m0.062s

To your second question (complexity): they're all O(n log n), ignoring fiddly implementation details like whether memory allocation is O(1) or not (vector::push_back and other forms of insert at the end are amortized O(1)) and assuming that by "sort" you mean a comparison sort. Other kinds of sort can have lower complexity.

Steve Jessop
Why put the elements of the queue in a vector?
static_rtti
@static_rtti: just because I don't know what you do want to do with them, so I'm doing something. It's necessary to do all the pops in order to evaluate the speed of the priority queue, but I suppose I didn't have to use the values. I doubt that adding them to the vector takes much extra time compared with the `pop` itself, but you should run your own test that's as close as possible to your real intended use.
Steve Jessop
Thanks for the tests!
static_rtti
+1  A: 

As far as I understand, your problem does not require Priority Queue, since your tasks sounds like "Make many insertions, after that sort everything". That's like shooting birds from a laser, not an appropriate tool. Use standard sorting techniques for that.

You would need a Priority Queue, if your task was to imitate a sequence of operations, where each operation can be either "Add an element to the set" or "Remove smallest/greatest element from the set". This can be used in problem of finding a shortest path on the graph, for example. Here you cannot just use standard sorting techniques.

SPIRiT_1984