tags:

views:

107

answers:

4

Hello,

I am getting unexpected(?) results from this mergesort implementation. It's extremely slow compared to my three-way quicksort(also written in python).

My quicksort finishes with 10000 elements after about 0.005s while mergesort needs 1.6s! Including the source code for both implementations.

Mergesort:

#Merges two sorted lists into one sorted list. recursively
def merge(left, right):
    if len(left) == 0 and len(right) == 0:
        return []
    elif len(left) == 0:
        return right
    elif len(right) == 0:
        return left
    else:
        if left[0] <= right[0]:
            return left[:1] + merge(left[1:],right)
        else:
            return right[:1] + merge(left,right[1:])

#Splits a list in half and returns the halves
def halve(list):
    return list[:len(list)//2],list[len(list)//2:]

#Mergesort
def mergesort(list):
    if len(list) <= 1:
        return list
    left,right = halve(list)
    left,right = mergesort(left),mergesort(right)
    return merge(left,right)

Quicksort:

#Three-way QuickSort in Python
def quicksort(a):
    if len(a) == 0:
        return []
    p = a[(len(a)-1)//2]
    less = quicksort([x for x in a if x < p])
    greater = quicksort([x for x in a if x > p])
    equal = [x for x in a if x == p]
    return less + equal + greater

Can someone come up with an explanation or maybe even a fix?

+1  A: 

Explanation:

Typically, quicksort is significantly faster in practice than other Θ(nlogn) algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data, it is possible to make design choices which minimize the probability of requiring quadratic time.

leoluk
That may be true in general, but not this time.
Mark Ransom
+5  A: 

Guesses about performance are usually wrong, but i'll go with this once since i do have some experience with this. Profile if you really want to know:

You are adding lists, ie left[:1] + merge(left[1:],right), this is one of the slower operations in Python. It creates a new list from both lists, so your mergesort creates like N**2 intermediate lists. The quicksort on the other hand uses very fast LCs instead and creates less lists (I think like 2N or so).

Try using extend instead of the +, maybe that helps.

THC4k
+1 since I suspect you're on the right track. However, his quicksort implementation also adds lists.
Daniel Stutzbach
+2  A: 

A recursive mergesort is not really the best way to do things. You should get better performance with a straight iterative approach. I'm not very conversant with Python, so I'll give you the C-like pseudocode.

ileft = 0 // index into left array
iright = 0 // index into right array
iresult = 0 // index into result array
while (ileft < left.length && iright < right.length)
{
    if (left[ileft] <= right[iright])
        result[iresult++] = left[ileft++]
    else
        result[iresult++] = right[iright++]
}

// now clean up the remaining list
while (ileft < left.length)
    result[iresult++] = left[ileft++]

while (iright < right.length)
    result[iresult++] = right[iright++]
Jim Mischel
Ok, I will try translating your code in hope of getting better results :) Thank you
vichle
This worked very nicely.
vichle
A: 

Just out of curiosity, I wrote a quick implementation using generators (could be cleaner). How does this compare with those in the original method?

def merge(listA,listB):
    iterA, iterB = iter(listA), iter(listB)
    valA, valB = iterA.next(), iterB.next()
    while True:
        if valA <= valB:
            yield valA
            try:
                valA = iterA.next()
            except StopIteration:
                yield valB
                try:
                    while True:
                        yield iterB.next()
                except StopIteration:
                    return
        else:
            yield valB
            try:
                valB = iterB.next()
            except StopIteration:
                yield valA
                try:
                    while True:
                        yield iterA.next()
                except StopIteration:
                    return
MattH
That's significantly faster. Faster than my quicksort actually. Thank you!
vichle
I don't really get what it does though :p Python isn't my native language.
vichle
When a function contains the keyword "yield", it is a generator function. When a generator function is called, it returns and iterator object. When the `next` method of the iterator object is called, the generator function runs up to and including the first `yield` statement, `next` returns the yielded value. When `next` is called again the function resumes where it was. When the generator function exits the `next` method raises a `StopIteration` exception. This generator expression iterates the two input lists. So there are no lists being created or concatenated in this implementation.
MattH