tags:

views:

2778

answers:

5

Hello,

I need to use a priority queue in my Python code. Looking around for something efficient, I came upon heapq. It looks good, but seems to be specified only for integers. I suppose it works with any objects that have comparison operators, but it doesn't specify what comparison operators it needs.

Besides, heapq seems to be implemented in Python, so it's not fast.

Are you aware of any fast implementations for priority queues in Python ? Optimally, I'd like the queue to be generic (i.e. work well for any object with a specified comparison operator).

Thanks in advance

Update:

Re comparison in heapq, I can either use a (priority, object) as Charlie Martin suggests, or just implement __cmp__ for my object.

I'm still looking for something faster than heapq.

+12  A: 

Um, Queue.PriorityQueue ? Recall that Python isn't strongly typed, so you can save anything you like: just make a tuple of (priority,thing) and you're set.

Charlie Martin
PriorityQueue is great, works as intended but beware, it's 2.6 only. If your code will work on a platform older than 2.6.0 then you've to carry it around like a backport.
Berk D. Demir
Doesn't have a peek function :-(
Casebash
How is this better than heapq? It uses heapq...
Chris S
It solves original questioner's actual problem, which is to manage a queue of non-integers.
Charlie Martin
+4  A: 

Did you look at the "Show Source" link on the heapq page? There's an example a little less than halfway down of using a heap with a list of (int, char) tuples as a priority queue.

Harper Shelby
heapq is still slow...
Eli Bendersky
I stand corrected (by Benjamin Peterson). heapq uses a C implementation, which is fast.
Eli Bendersky
+4  A: 

I've not used it, but you could try PyHeap. It's written in C so hopefully it is fast enough for you.

Are you positive heapq/PriorityQueue won't be fast enough? It might be worth going with one of them to start, and then profiling to see if it really is your performance bottlneck.

Kiv
+2  A: 

How do you know it's too slow? Have you profiled it yet? Also since 2.4, there's a C implementation of heapq in the standard library which is used.

Benjamin Peterson
Are you sure that since 2.5 heapq is implemented in C ? How do you know that ? In my Python 2.5 installation (ActiveState 2.5.2), heapq only has a Python module in 'lib', no .pyd files
Eli Bendersky
Good point. I missed it on my first look through heapq.py, but you're completely right, it does load the C implementation.
Kiv
Line 305 of heapq.py:# If available, use C implementationtry: from _heapq import heappush, heappop, heapify, heapreplace, nlargest, nsmallest
Kiv
Right! Thanks a lot !
Eli Bendersky
To answer your other question, if you do:import _heapqhelp(_heapq)under the file heading it says "built-in", which I think means you won't find it as a separate file.
Kiv
(Actually you can see if in the Modules directory if you have the source; see this post:http://stackoverflow.com/questions/269795/how-do-i-find-the-location-of-python-module-sources
Kiv
Yep, you can also see it in the list of built-in modules in sys:
Eli Bendersky
That is: _heapq is in sys.builtin_module_names
Eli Bendersky
+2  A: 

I ended up implementing a wrapper for heapq, adding a dict for maintaining the queue's elements unique. The result should be quite efficient for all operators:

class PriorityQueueSet(object):
    """ Combined priority queue and set data structure. Acts like
        a priority queue, except that its items are guaranteed to
        be unique.

        Provides O(1) membership test, O(log N) insertion and 
        O(log N) removal of the smallest item.

        Important: the items of this data structure must be both
        comparable and hashable (i.e. must implement __cmp__ and
        __hash__). This is true of Python's built-in objects, but
        you should implement those methods if you want to use
        the data structure for custom objects.
    """
    def __init__(self, items=[]):
        """ Create a new PriorityQueueSet.

            items:
                An initial item list - it can be unsorted and 
                non-unique. The data structure will be created in
                O(N).
        """
        self.set = dict((item, True) for item in items)
        self.heap = self.set.keys()
        heapq.heapify(self.heap)

    def has_item(self, item):
        """ Check if *item* exists in the queue
        """
        return item in self.set

    def pop_smallest(self):
        """ Remove and return the smallest item from the queue
        """
        smallest = heapq.heappop(self.heap)
        del self.set[smallest]
        return smallest

    def add(self, item):
        """ Add *item* to the queue. The item will be added only
            if it doesn't already exist in the queue.
        """
        if not (item in self.set):
            self.set[item] = True
            heapq.heappush(self.heap, item)
Eli Bendersky
Looks good, but you should use "item in set" rather than "set.has_key(item)". It's faster (less method call overhead), and the second one has been removed in Python 3.0.
Kiv