views:

1035

answers:

4

Does anybody know if Python has an equivalent to Java's SortedSet interface?

Heres what I'm looking for: lets say I have an object of type foo, and I know how to compare two objects of type foo to see whether foo1 is "greater than" or "less than" foo2. I want a way of storing many objects of type foo in a list L, so that whenever I traverse the list L, I get the objects in order, according to the comparison method I define.

Edit:

I guess I can use a dictionary or a list and sort() it every time I modify it, but is this the best way?

+3  A: 

If you only need the keys, and no associated value, Python offers sets:

s = set(a_list)

for k in sorted(s):
    print k

However, you'll be sorting the set each time you do this. If that is too much overhead you may want to look at HeapQueues. They may not be as elegant and "Pythonic" but maybe they suit your needs.

Ber
+5  A: 

You can use insort from the bisect module to insert new elements efficiently in an already sorted list:

from bisect import insort

items = [1,5,7,9]
insort(items, 3)
insort(items, 10)

print items # -> [1, 3, 5, 7, 9, 10]

Note that this does not directly correspond to SortedSet, because it uses a list. If you insert the same item more than once you will have duplicates in the list.

sth
Finally somebody suggests an O(log n) implementation. ;)
Nikhil Chelliah
Combine it with an O(log n) binary search (http://stackoverflow.com/questions/212358/binary-search-in-python) to make sure there are no duplicates.
Nikhil Chelliah
Actually, it's O(n) for insertion - insertion in a vector requires shifting the elements after the insert point. It is still O(log(n)) for lookup however, and sorted iteration is just O(n) (vs O(n*log(n)) for dict).
Brian
I did forget about that, but still: Python lists are amortized, so the average insert is O(n)/n, which approaches O(1).
Nikhil Chelliah
No - that only covers appending to the end. Overallocation won't help with inserting the middle: you'll still always have to copy an average of N/2 items one space right completely independent of having to allocate space for the new element.
Brian
+2  A: 

If you're looking for an implementation of an efficient container type for Python implemented using something like a balanced search tree (A Red-Black tree for example) then it's not part of the standard library.

I was able to find this, though:

http://www.brpreiss.com/books/opus7/

The source code is available here:

http://www.brpreiss.com/books/opus7/public/Opus7-1.0.tar.gz

I don't know how the source code is licensed, and I haven't used it myself, but it would be a good place to start looking if you're not interested in rolling your own container classes.

There's PyAVL which is a C module implementing an AVL tree.

Also, this thread might be useful to you. It contains a lot of suggestions on how to use the bisect module to enhance the existing Python dictionary to do what you're asking.

Of course, using insort() that way would be pretty expensive for insertion and deletion, so consider it carefully for your application. Implementing an appropriate data structure would probably be a better approach.

In any case, to understand whether you should keep the data structure sorted or sort it when you iterate over it you'll have to know whether you intend to insert a lot or iterate a lot. Keeping the data structure sorted makes sense if you modify its content relatively infrequently but iterate over it a lot. Conversely, if you insert and delete members all the time but iterate over the collection relatively infrequently, sorting the collection of keys before iterating will be faster. There is no one correct approach.

Ori Pessach
+3  A: 

Take a look at BTrees. It look like you need one of them. As far as I understood you need structure that will support relatively cheap insertion of element into storage structure and cheap sorting operation (or even lack of it). BTrees offers that.

I've experience with ZODB.BTrees, and they scale to thousands and millions of elements.

myroslav
This is the proper implementation.
Nikhil Chelliah