views:

955

answers:

8

I have a bunch of sorted lists of objects, and a comparison function

class Obj :
    def __init__(p) :
        self.points = p
def cmp(a, b) :
    return a.points < b.points

a = [Obj(1), Obj(3), Obj(8), ...]
b = [Obj(1), Obj(2), Obj(3), ...]
c = [Obj(100), Obj(300), Obj(800), ...]

result = magic(a, b, c)
assert result == [Obj(1), Obj(1), Obj(2), Obj(3), Obj(3), Obj(8), ...]

what does magic look like? My current implementation is

def magic(*args) :
    r = []
    for a in args : r += a
    return sorted(r, cmp)

but that is quite inefficient. Better answers?

A: 

I don't know whether it would be any quicker, but you could simplify it with:

def GetObjKey(a):
    return a.points

return sorted(a + b + c, key=GetObjKey)

You could also, of course, use cmp rather than key if you prefer.

Al
+1  A: 

Use the bisect module. From the documentation: "This module provides support for maintaining a list in sorted order without having to sort the list after each insertion."

import bisect

def magic(*args):
    r = []
    for a in args:
        for i in a:
            bisect.insort(r, i)
    return r
codeape
+6  A: 

Python standard library offers a method for it: heapq.merge.
As the documentation says, it is very similar to using itertools (but with more limitations); if you cannot live with those limitations (or if you do not use Python 2.6) you can do something like this:

sorted(itertools.chain(args), cmp)

However, I think it has the same complexity as your own solution, although using iterators should give some quite good optimization and speed increase.

Roberto Liffredo
Using key instead of cmp should be prefered (and shoudl be faster). Python3 does not have cmp parameter anyway.
Jiri
Actually, I was just using the same format as OP, but you are absolutely right and *key* should be preferred over *cmp*.
Roberto Liffredo
Well, and the OP's cmp function is wrong and doesn't work. If you're using heapq, you'll have to provide __lt__ etc. methods on your class or use a tuple of (sorting key, object) in your heap instead.
Aaron Gallagher
I think you mean itertools.chain(*args). What you wrote is equivalent to sorted(iter(args), cmp), which makes little sense.
Marius Gedminas
+2  A: 

Instead of using a list, you can use a heap.

The insertion is O(log(n)), so merging a, b and c will be O(n log(n))

In Python, you can use the heapq module.

ThibThib
+1: Sorting a list in inherently inefficient: prevent the sort by using a smarter structure.
S.Lott
A: 

One line solution using sorted:

def magic(*args):
  return sorted(sum(args,[]), key: lambda x: x.points)

IMO this solution is very readable.

Using heapq module, it could be more efficient, but I have not tested it. You cannot specify cmp/key function in heapq, so you have to implement Obj to be implicitly sorted.

import heapq
def magic(*args):
  h = []
  for a in args:
    heapq.heappush(h,a)
  return [i for i in heapq.heappop(h)
Jiri
Your heapq method is a mess. You're pushing entire lists instead of their items, and you're ignoring the key. The one liner is cool, though.
itsadok
Yeah you are right, I have used heapq just few times and I did not paste that to console to test it. My fault, sorry. Although now I see that Obj object must be defined "sortable" for heapq to work, because you cannot specify cmp/key function in heapq.
Jiri
This code is all around a mess. Both snippets have syntax errors, and using sum for concatenating lists is very inefficient. Not to mention that there's operator.attrgetter to replace the lambda.
Aaron Gallagher
A: 

Here you go: a fully functioning merge sort for lists (adapted from my sort here):

def merge(*args):
    import copy
    def merge_lists(left, right):
        result = []
        while (len(left) and len(right)):
            which_list = (left if left[0] <= right[0] else right)
            result.append(which_list.pop(0))
        return result + left + right
    lists = [arg for arg in args]
    while len(lists) > 1:
        left, right = copy.copy(lists.pop(0)), copy.copy(lists.pop(0))
        result = merge_lists(left, right)
        lists.append(result)
    return lists.pop(0)

Call it like this:

merged_list = merge(a, b, c)
for item in merged_list:
    print item

For good measure, I'll throw in a couple of changes to your Obj class:

class Obj(object):
    def __init__(self, p) :
        self.points = p
    def __cmp__(self, b) :
        return cmp(self.points, b.points)
    def __str__(self):
        return "%d" % self.points
  • Derive from object
  • Pass self to __init__()
  • Make __cmp__ a member function
  • Add a str() member function to present Obj as string
hughdbrown
A: 

I like Roberto Liffredo's answer. I didn't know about heapq.merge(). Hmmmph.

Here's what the complete solution looks like using Roberto's lead:

class Obj(object):
    def __init__(self, p) :
        self.points = p
    def __cmp__(self, b) :
        return cmp(self.points, b.points)
    def __str__(self):
        return "%d" % self.points

a = [Obj(1), Obj(3), Obj(8)]
b = [Obj(1), Obj(2), Obj(3)]
c = [Obj(100), Obj(300), Obj(800)]

import heapq

sorted = [item for item in heapq.merge(a,b,c)]
for item in sorted:
    print item

Or:

for item in heapq.merge(a,b,c):
    print item
hughdbrown
A: 

I asked a similar question and got some excellent answers:

The best solutions from that question are variants of the merge algorithm, which you can read about here:

David Wilson