views:

437

answers:

6

What is the fastest way to sort an array of whole integers bigger than 0 and less than 100000 in Python? But not using the built in functions like sort.

Im looking at the possibility to combine 2 sport functions depending on input size.

+6  A: 

Since you know the range of numbers, you can use Counting Sort which will be linear in time.

codaddict
@Downvoter: Care to explain ?
codaddict
@codaddict: (I did not downvote). Note that this is not a good algorithm if the array of integers is significantly smaller than 100000, as it will waste memory (and thus time) to construct the 100000 element list.
Brian
A: 

The built in functions are best, but since you can't use them have a look at this:

http://en.wikipedia.org/wiki/Quicksort

Hannesh
+5  A: 

If you are interested in asymptotic time, then counting sort or radix sort provide good performance.

However, if you are interested in wall clock time you will need to compare performance between different algorithms using your particular data sets, as different algorithms perform differently with different datasets. In that case, its always worth trying quicksort:

def qsort(inlist):
    if inlist == []: 
        return []
    else:
        pivot = inlist[0]
        lesser = qsort([x for x in inlist[1:] if x < pivot])
        greater = qsort([x for x in inlist[1:] if x >= pivot])
        return lesser + [pivot] + greater
fmark
Good advice, except choice of variable list, which can cause nice errors. I post other, faster version.
Tony Veijalainen
Running the list comprehension twice over the same set of variables is probably also less than optimal.
Paul McMillan
@Tony Veijalainen “There are only two hard things in Computer Science: cache invalidation and naming things” -- I've changed the variable name
fmark
+2  A: 

Early versions of Python used a hybrid of samplesort (a variant of quicksort with large sample size) and binary insertion sort as the built-in sorting algorithm. This proved to be somewhat unstable. S0, from python 2.3 onward uses adaptive mergesort algorithm.

Order of mergesort (average) = O(nlogn). Order of mergesort (worst) = O(nlogn). But Order of quick sort (worst) = n*2

if you uses list=[ .............. ]

list.sort() uses mergesort algorithm.

For comparison between sorting algorithm you can read wiki

For detail comparison comp

Tauquir
it's timsort which is more adaptive than mergesort
aaronasterling
Timsort is an adaptive, stable, natural mergesort.
Tauquir
+1  A: 

We can use count sort using a dictionary to minimize the additional space usage, and keep the running time low as well. The count sort is much slower for small sizes of the input array because of the python vs C implementation overhead. The count sort starts to overtake the regular sort when the size of the array (COUNT) is about 1 million.

If you really want huge speedups for smaller size inputs, implement the count sort in C and call it from Python.

(Fixed a bug which Aaron (+1) helped catch ...) The python only implementation below compares the 2 approaches...

import random
import time

COUNT = 3000000

array = [random.randint(1,100000) for i in range(COUNT)]
random.shuffle(array)

array1 = array[:]

start = time.time()
array1.sort()
end = time.time()
time1 = (end-start)
print 'Time to sort = ', time1*1000, 'ms'

array2 = array[:]

start = time.time()
ardict = {}
for a in array2:
    try:
        ardict[a] += 1
    except:
        ardict[a] = 1

indx = 0
for a in sorted(ardict.keys()):
    b = ardict[a]
    array2[indx:indx+b] = [a for i in xrange(b)]
    indx += b

end = time.time()
time2 = (end-start)
print 'Time to count sort = ', time2*1000, 'ms'

print 'Ratio =', time2/time1
Rajan
+1 `Ratio = 1.16710428623` on my machine. clever use of a dict. It's worth noting though that changing the dict construction phase from `try: ardict[a] += 1; except: ardict[a] = 1` to `if a in ardict: ardict[a] += 1; else: ardict[a] = 1` drops the ratio to `Ratio = 0.696179723863` Sometimes (often) it is better to look before you leap. I knew to do this because `try` is only cheaper than `if` if the exception rarely occurs. An actual exception is still very expensive.
aaronasterling
unfortunately this algorithm is wrong. Try `array = [1,10, 100, 1000, 10000, 100000, 1000000]`. The dangers of skating on undocumented implementation details strike again.
aaronasterling
Thanks aaron - fixed the bug of not sorting the dict keys. That should slow it down a bit. However, it will preserve its almost O(n) nature if number of distinct elements compared to the array size is low.I would love to see a 3D plot of distinct elements, array length as x and y dimensions and ratio of running time and the 3rd dimension. Maybe I will do it in a day or 2.
Rajan
@Aaron: On my comp, the (try, except) works better. I coded it the (if then else) way first and switched to the (try, except) to speed up things. Also, the version with the bug removed runs faster than the previous one - because of the underlying C sort being used in sorting keys. Thats free beer right there. :-)
Rajan
For me the except solution was also faster, little faster generation was by using tuple with generator if list is not necessary to produce (as it is mutable): `array2=tuple(a for a in sorted(ardict) for i in xrange(ardict[a]))`
Tony Veijalainen
Also doing `import blistarray = blist.blist(random.randint(1,100000) for i in range(COUNT))` has interesting effects especially for built in sort.
Tony Veijalainen
+2  A: 

Radix sort theoretically runs in linear time (sort time grows roughly in direct proportion to array size ), but in practice Quicksort is probably more suited, unless you're sorting absolutely massive arrays.

If you want to make quicksort a bit faster, you can use insertion sort] when the array size becomes small.

It would probably be helpful to understand the concepts of algorithmic complexity and Big-O notation too.

jaffachief
When you say the array size becomes small do you mean less than 64 in size?
Anders
I'd say more about less than 10, but there's no right answer; the best idea is to experiment with different values and see which ends up faster.
jaffachief