ansaurus

Question

Answer 1

+6 A:

Since you know the range of numbers, you can use Counting Sort which will be linear in time.

codaddict 2010-10-04 13:16:47

@Downvoter: Care to explain ?

codaddict 2010-10-04 13:26:06

@codaddict: (I did not downvote). Note that this is not a good algorithm if the array of integers is significantly smaller than 100000, as it will waste memory (and thus time) to construct the 100000 element list.

Brian 2010-10-04 13:59:31

Answer 2

A:

The built in functions are best, but since you can't use them have a look at this:

http://en.wikipedia.org/wiki/Quicksort

Hannesh 2010-10-04 13:19:02

Answer 3

+5 A:

If you are interested in asymptotic time, then counting sort or radix sort provide good performance.

However, if you are interested in wall clock time you will need to compare performance between different algorithms using your particular data sets, as different algorithms perform differently with different datasets. In that case, its always worth trying quicksort:

def qsort(inlist):
    if inlist == []: 
        return []
    else:
        pivot = inlist[0]
        lesser = qsort([x for x in inlist[1:] if x < pivot])
        greater = qsort([x for x in inlist[1:] if x >= pivot])
        return lesser + [pivot] + greater

fmark 2010-10-04 13:23:29

Good advice, except choice of variable list, which can cause nice errors. I post other, faster version.

Tony Veijalainen 2010-10-04 13:57:35

Running the list comprehension twice over the same set of variables is probably also less than optimal.

Paul McMillan 2010-10-04 23:16:41

@Tony Veijalainen “There are only two hard things in Computer Science: cache invalidation and naming things” -- I've changed the variable name

fmark 2010-10-05 00:42:59

Answer 4

+2 A:

Early versions of Python used a hybrid of samplesort (a variant of quicksort with large sample size) and binary insertion sort as the built-in sorting algorithm. This proved to be somewhat unstable. S0, from python 2.3 onward uses adaptive mergesort algorithm.

Order of mergesort (average) = O(nlogn). Order of mergesort (worst) = O(nlogn). But Order of quick sort (worst) = n*2

if you uses list=[ .............. ]

list.sort() uses mergesort algorithm.

For comparison between sorting algorithm you can read wiki

For detail comparison comp

Tauquir 2010-10-04 13:27:29

it's timsort which is more adaptive than mergesort

aaronasterling 2010-10-04 13:29:23

Timsort is an adaptive, stable, natural mergesort.

Tauquir 2010-10-04 13:39:50

Answer 5

+1 A:

We can use count sort using a dictionary to minimize the additional space usage, and keep the running time low as well. The count sort is much slower for small sizes of the input array because of the python vs C implementation overhead. The count sort starts to overtake the regular sort when the size of the array (COUNT) is about 1 million.

If you really want huge speedups for smaller size inputs, implement the count sort in C and call it from Python.

(Fixed a bug which Aaron (+1) helped catch ...) The python only implementation below compares the 2 approaches...

import random
import time

COUNT = 3000000

array = [random.randint(1,100000) for i in range(COUNT)]
random.shuffle(array)

array1 = array[:]

start = time.time()
array1.sort()
end = time.time()
time1 = (end-start)
print 'Time to sort = ', time1*1000, 'ms'

array2 = array[:]

start = time.time()
ardict = {}
for a in array2:
    try:
        ardict[a] += 1
    except:
        ardict[a] = 1

indx = 0
for a in sorted(ardict.keys()):
    b = ardict[a]
    array2[indx:indx+b] = [a for i in xrange(b)]
    indx += b

end = time.time()
time2 = (end-start)
print 'Time to count sort = ', time2*1000, 'ms'

print 'Ratio =', time2/time1

Rajan 2010-10-04 20:26:23

+1 `Ratio = 1.16710428623` on my machine. clever use of a dict. It's worth noting though that changing the dict construction phase from `try: ardict[a] += 1; except: ardict[a] = 1` to `if a in ardict: ardict[a] += 1; else: ardict[a] = 1` drops the ratio to `Ratio = 0.696179723863` Sometimes (often) it is better to look before you leap. I knew to do this because `try` is only cheaper than `if` if the exception rarely occurs. An actual exception is still very expensive.

aaronasterling 2010-10-04 21:16:44

unfortunately this algorithm is wrong. Try `array = [1,10, 100, 1000, 10000, 100000, 1000000]`. The dangers of skating on undocumented implementation details strike again.

aaronasterling 2010-10-04 21:53:28

Thanks aaron - fixed the bug of not sorting the dict keys. That should slow it down a bit. However, it will preserve its almost O(n) nature if number of distinct elements compared to the array size is low.I would love to see a 3D plot of distinct elements, array length as x and y dimensions and ratio of running time and the 3rd dimension. Maybe I will do it in a day or 2.

Rajan 2010-10-04 22:29:11

@Aaron: On my comp, the (try, except) works better. I coded it the (if then else) way first and switched to the (try, except) to speed up things. Also, the version with the bug removed runs faster than the previous one - because of the underlying C sort being used in sorting keys. Thats free beer right there. :-)

Rajan 2010-10-04 22:34:31

For me the except solution was also faster, little faster generation was by using tuple with generator if list is not necessary to produce (as it is mutable): `array2=tuple(a for a in sorted(ardict) for i in xrange(ardict[a]))`

Tony Veijalainen 2010-10-05 10:29:54

Also doing `import blistarray = blist.blist(random.randint(1,100000) for i in range(COUNT))` has interesting effects especially for built in sort.

Tony Veijalainen 2010-10-05 10:51:05

Answer 6

+2 A:

Radix sort theoretically runs in linear time (sort time grows roughly in direct proportion to array size ), but in practice Quicksort is probably more suited, unless you're sorting absolutely massive arrays.

If you want to make quicksort a bit faster, you can use insertion sort] when the array size becomes small.

It would probably be helpful to understand the concepts of algorithmic complexity and Big-O notation too.

jaffachief 2010-10-04 22:51:49

When you say the array size becomes small do you mean less than 64 in size?

Anders 2010-10-05 00:23:15

I'd say more about less than 10, but there's no right answer; the best idea is to experiment with different values and see which ends up faster.

jaffachief 2010-10-05 16:27:54

ansaurus

tags:

views:

answers:

Fastest way to sort in Python

related questions