views:

1547

answers:

18

I have the following code thats use bubble sort to invert a list and has a worst time performance:

for i in xrange(len(l)):
    for j in xrange(len(l)):
        if l[i]>l[j]:
            l[i], l[j] = l[j], l[i]

In some cases (when len(l) = 100000) the code spend more then 2h to complete execute, I think its so strange, please correct my code or give some suggestions. numpy and numarray solutions are welcome.

+19  A: 

Bubble sort is a horrible algorithm to sort with. That is quite possibly the reason. If speed is necessary, I would try another algorithm like quick sort or merge sort.

Kevin
+1: It's always slow.
S.Lott
It's not slow in the case of presorted data.
Nosredna
Nako's "naive" bubble sort is O(n^2) even with presorted data.
Miles
I think there should be a name for this kind of sort.
Nosredna
"Bubble sort with pointless extra iterations"?
Miles
"Naive bubble sort"?
las3rjock
How about "Bumble Sort?"
Nosredna
I wonder how this got voted so high--the guy said he was explicitly doing this to compare performance, optimizing the code wouldn't make any sense.
Bill K
+1  A: 

If you're interested in making your own sort, you can change a bubble sort to a comb sort with just a couple lines of code. Comb sort is nearly as good as the best sorts. Of course, making your own sort is best left as a learning exercise.

Comb sort improves on bubble sort, and rivals in speed more complex algorithms like Quicksort.

http://en.wikipedia.org/wiki/Comb_sort

Nosredna
+1  A: 

That doesn't look like bubble sort to me, and if it is, it's a very inefficient implementation of it.

Kai
I'm going to agree that that doesn't look like a bubble sort to me.
Nosredna
How is that not a bubble sort? http://www.codecodex.com/wiki/index.php?title=Bubble_sort#Python. The only potential issue he has is checking the same index against itself. Besides, any implementation of a bubble sort will, by definition, be inefficient. Bubble sort IS an inefficient algorithm: http://www.sorting-algorithms.com/bubble-sort.
bedwyr
It's not the classic bubble sort. Bubble sort has i going from 0 to N then j inside going from i+1 (not 0) to N. Huge difference. Lots of pointless looking of stuff that was already done.
Nosredna
Because he never breaks either of the loops, this "version" of bubble sort will always perform in worst-case time, even if the array is already sorted. That's why it's such a poor implementation.
Kai
A: 

Like the other posts say, bubble sort is horrible. It pretty much should be avoided at all costs due to the bad proformance, like you're experiencing.
Luckily for you there are lots of other sorting algorithms, http://en.wikipedia.org/wiki/Sorting_algorithm, for examples.

In my experience in school is that quicksort and mergesort are the other two basic sorting algorithms introduced with, or shortly after, bubble sort. So I would recommend you look into those for learning more effective ways to sort.

Jared
+10  A: 

That's not quite a bubble sort... unless I've made a trivial error, this would be closer to a python bubble sort:

swapped = True
while swapped:
  swapped = False
  for i in xrange(len(l)-1):
    if l[i] > l[i+1]:
      l[i],l[i+1] = l[i+1],l[i]
      swapped = True

Note that the whole idea is that the "bubble" moves along the array, swapping adjacent values until it moves through the list, with nothing swapped. There are a few optimizations that can be made (such as shrinking the size of the inner loop), but they are usually only worth bothering with when you are "homework oriented".

Edit: Fixed length() -> len()

jkerian
Its more close to what I want, but I think its have the same performance.I looking for a numpy solution.
Nako
What the others are saying is also quite true. Bubble sort is a terrible choice. If you're looking at simple sorts, even insertion is quite a bit better.Also note that feeding the easy to generate test_data = range(100000) test_data.reverse(), does pretty effectively generate the worst case data for the simpler sorting routines
jkerian
As measured by comparisons, the OP sort routine guarantees worst case performance with every sort. The odd logic surrounding "swapped" is just designed to short circuit the loops when the array is finally sorted.I don't understand what you mean by "numpy" solution... are you looking for http://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html ?
jkerian
Thanks for the answers, but the idea is to compare the Bubble Sort algorithm in different languages.I so surprise the performace of python
Nako
I try to code this example with numpy to get performance but I have no idea how to make it.
Nako
I suspect the numpy project would find the idea of a bubble sort implementation in their codebase abhorrent.
jkerian
Looking over the original code, at first I couldn't even see how it worked (I had to run it before I believed that it did). It does reverse sort the array by a rather curious method.The first iteration of the outer loop swaps the lowest value in the array into the first position. In ensuing outer loops, this value is then carried across the array as a psuedo pivot point, with everything to it's left repeatedly swapped until it is in strictly descending order in each iteration. In many ways, this is much closer to an insertion sort, than a bubble sort.
jkerian
Any computational benefits from using NumPy are likely to get wiped out by the decision to implement your own highly-suboptimal sorting algorithm.
las3rjock
+6  A: 

Bubble sort may be horrible and slow etc, but would you rather have an O(N^2) algorithm over 100 items, or O(1) one that required a dial-up connection?

And a list of 100 itmes shouldnt take 2 hours. I don't know python, but are you by any chance copying entire lists when you make those assignments?

Here's a bubble sort in Python from Google:

def bubbleSort(theList, max):
    for n in range(0,max): #upper limit varies based on size of the list
        temp = 0
        for i in range(1, max): #keep this for bounds purposes
            temp = theList[i]
            if theList[i] < theList[i-1]:
                theList[i] = theList[i-1]
                theList[i-1] = temp

and another, from wikipedia:

def bubblesort(l):
    "Sorts l in place and returns it."
    for passesLeft in range(len(l)-1, 0, -1):
        for index in range(passesLeft):
            if l[index] < l[index + 1]:
               l[index], l[index + 1] = l[index + 1], l[index]
    return l

The order of bubble sort is N(N-1). This is essentially N^2, because for every element you require to scan the list and compare every element.

By the way, you will probably find C++ to be the fastest, then Java, then Python.

jgubby
sorry, I wrote the wrong number, the real len(l) is 100000
Nako
Excellent points.
Beska
Ah, well that makes a difference. Thats then the order of 10,000,000,000 iterations for bubblesort! You definitely need another algorithm. http://en.literateprograms.org/Quicksort_(Python)
jgubby
its for study programing languages and their capabilities
Nako
No programming language can overcome the decision to use a bad algorithm.
las3rjock
+5  A: 

What do you mean by numpy solution ? Numpy has some sort facilities, which are instantenous for those reasonably small arrays:

import numpy as np
a = np.random.randn(100000)
# Take a few ms on a decent computer
np.sort(a)

There are 3 sorts of sort algorithms available, all are Nlog(N) on average.

David Cournapeau
I saw many examples about numpy and numarray. I can choose to use quick or merge to sort using the default numpy lib, these part I understand, but I try to implement a bubblesort using the numarray without success
Nako
I don't understand what you are trying to do: just as an exercise ? Why using numpy instead of lists ?
David Cournapeau
It was an exercise for my class of programming languages.I thought the cause of the python's low performance is because everything is objects and I think developing a solution using numpy (numarray) will increase performance
Nako
numpy will be even worse for this - the speed of numpy comes from so-called vectorization, mostly, where loops are implemented in C on native types (C int, float, etc...). But in your case, since you need to access each item, it will be very slow, as every item you will swap will be at the python level. Not only will you get the python object overhead, but also the conversion overhead !If that's an exercise on bubble sort, I don't see any point in being fast, BTW. That's like asking a fast Fourier transform without using FFT.
David Cournapeau
A: 

If you must code your own, use an insertion sort. Its about the same amount of code, but several times faster.

Matthias Wandel
The funny thing is that there is NO reason for him to code his own. QSort is builtin...
Shane C. Mason
+2  A: 

For one, you're doing too many loops. Your inner loop should proceed from i + 1 to the end of the list, not from 0. Secondly, as noted by others, bubble sort has a O(N^2) complexity so for 100000 elements, you are looping 10,000,000,000 times. This is compounded by the fact that looping is one of the areas where interpreted languages have the worst performance. It all adds up to incredibly poor performance. This is why any computations that require such tight looping are usually written in C/C++ and wrapped for use by languages like Python.

Eric
+4  A: 

I believe you mentioned that you were trying to use that as a benchmark to compare speeds.

I think generally Python is a bit faster than Ruby, but not really near Java/C/C++/C#. Java is within 2x of the C's, but all the interpreted languages were around 100x slower.

You might Google "Programming Language Game" for a LOT of comparisons of apps/languages/etc. Check out a Python JIT for possibly better performance.

You might also compare it to Ruby to see a more fair test.

Edit: Just for fun (nothing to do with the question) check this--

public class Test {
    public static void main(String[]s) {
        int size=Integer.valueOf(s[0]).intValue();
        Random r=new Random();
        int[] l=new int[size];
        for(int i=0;i<size;i++)
            l[i]=r.nextInt();
        long ms=(new Date()).getTime();
        System.out.println("built");
        if(fast) {
            Arrays.sort(l);
        else {
            int temp;
            for(int i=0;i<size;i++)
                for(int j=0;j<size;j++)
                    if(l[i]>l[j]) {                        
                        temp=l[i];
                        l[j]=l[i];
                        l[j]=temp;                        
                    }
            }
        ms=(new Date()).getTime()-ms;
        System.out.println("done in "+ms/1000);
    }
}

The fun thing about this: The Java run times are on the order of:

Array size  Slow Time   Fast time
 100k         2s          0s
  1M         23s          0s
 10M         39m          2s
100M         NO          23s

Not that this addition has anything to do with the question, but holy cow the built-in impelemntation is FAST. I think it took longer to generate than sort (Guess that makes sense with calls to Random and memory allocation.)

Had to go into the CLI and -Xmx1000M to get that last one to even run.

Bill K
+1  A: 

Because it is going execute the comparison and possibly the swap 100,000 x 100,000 times. If the computer is fast enough to execute the innermost statement 1,000,000 times per second, that still is 167 minutes which is slightly short of 3 hours.

On a side note, why are there so many of these inane questions on SO? Isn't being able to do simple algebra a prerequisite for programming? ;-)

Sinan Ünür
+2  A: 

Here some code I put together to compare a base bubble sort against a more streamlined version (base vs modified) - the modified is about 2-3 times faster, still a slow sort, but faster

from array import *
from random import *
from time import *

def randarray(typecode, numElements, minValue, maxValue):
    a = array(typecode)
    for i in xrange(0, numElements):
        a.append(randint(minValue, maxValue))
    return a

def basesort(l):
    for i in xrange(len(l)):
        for j in xrange(len(l)):
            if l[i]<l[j]:
                l[i], l[j] = l[j], l[i]
    return l

def modifiedsort(l):
    NotComplete = True
    i = 0
    arange = xrange(len(l))
    while NotComplete:
        NotComplete = False
        for j in xrange(len(l) - i):
            if l[i]<l[j]:
                l[i], l[j] = l[j], l[i]
                NotComplete = True
        i += 1

Num = 1000
b = randarray('i', Num, 1, 100000)
m = b[:]

print 'perform base bubble sort'
t = time()
basesort(b)
basetime =  time() - t
print basetime
#print a
print 'complete'

print 'perform modified bubble sort'
t = time()
modifiedsort(m)
modtime =  time() - t
print modtime
#print a
print 'complete'

print 'mod sort is ', basetime / modtime,' fast then base sort'
meade
I tried your code and difference between the versions is considerable.I put an array with 100,000 and both spend more than a hour.I'm finishing my results for a while.will be python more fast in the future?
Nako
if you use the internal sort - sorted(m) - you'll see a 100x faster performance - I'm assuming this is due to the internal sort being in c/c++
meade
+2  A: 

I think that you are basically wasting your time using bubble on such a large dataset. There are 3 reasons why it is slow:

1) Python is slow 2) Bubble sort is slow 3) The bubble sort listed is coded incorrectly/inefficiently.

Regardless of how it is coded, it will be O(N^2). Why not use a merge/tree sort ..or if you want to try quicksort (also worst case O(N^2)) it might be faster for your particular dataset. I believe quicksort is empirically faster if the data already has a lot of ordering in it.

Larry Watanabe
+2  A: 

Bubblesort in general does not scale well to most possible inputs as the number of elements in the input grows. (I.e., it's O(N^2).)

As N grows, given a random input array of size N, you are much less likely to get an array that sorts quickly with bubblesort (e.g., almost sorted arrays). You are far more likely to get an array that takes a long time to sort.

However, the real kicker here is that the code you posted is not a bubble sort. Traditionally, bubblesort will terminate early if no swaps were made as well as not attempt to swap values that are already sorted. (After P number of passes, the P last items will be in the correct order, so you don't need to process them.) The actual code posted will always examine every pair in the array, so it will always run the inner loop N^2 times. For 100000 elements, that's 10000000000 iterations.

MSN
+2  A: 

Bubble sort makes O(N2) compare operations (or iterations). For N = 100,000, that means that there will be 10,000,000,000 iterations. If that takes 2 hours (call it 10,000 seconds), then it means you get 1,000,000 iterations per second - or 1 microsecond per iteration. That's not great speed, but it isn't too bad. And I'm waving hands and ignoring constant multiplication factors.

If you used a quicksort, then you'd get Nlog(N) iterations, which would mean about 1,000,000 iterations, which would take 1 second in total. (log10(N) is 5; I rounded it up to 10 for simplicity.)

So, you have just amply demonstrated why bubble sort is inappropriate for large data sets, and 100,000 items is large enough to demonstrate that.

Jonathan Leffler
A: 

I forgot to add, if you have some idea of the size of the dataset and the distribution of keys then you can use a radix sort which would be O(N). To get the idea of radix sort, consider the case where you are sorting say numbers more or less distributed between 0, 100,000. Then you just create something similar to a hash table, say an array of 100,000 lists, and add each number to the bucket. Here's an implementation I wrote in a few minutes that generates some random data, sorts it, and prints out a random segment. The time is less than 1 sec to execute for an array of 100,000 integers.

Option Strict On Option Explicit On

Module Module1

Private Const MAX_SIZE As Integer = 100000
Private m_input(MAX_SIZE) As Integer
Private m_table(MAX_SIZE) As List(Of Integer)
Private m_randomGen As New Random()
Private m_operations As Integer = 0

Private Sub generateData()
    ' fill with random numbers between 0 and MAX_SIZE - 1
    For i = 0 To MAX_SIZE - 1
        m_input(i) = m_randomGen.Next(0, MAX_SIZE - 1)
    Next

End Sub

Private Sub sortData()
    For i As Integer = 0 To MAX_SIZE - 1
        Dim x = m_input(i)
        If m_table(x) Is Nothing Then
            m_table(x) = New List(Of Integer)
        End If
        m_table(x).Add(x)
        ' clearly this is simply going to be MAX_SIZE -1
        m_operations = m_operations + 1
    Next
End Sub

 Private Sub printData(ByVal start As Integer, ByVal finish As Integer)
    If start < 0 Or start > MAX_SIZE - 1 Then
        Throw New Exception("printData - start out of range")
    End If
    If finish < 0 Or finish > MAX_SIZE - 1 Then
        Throw New Exception("printData - finish out of range")
    End If
    For i As Integer = start To finish
        If m_table(i) IsNot Nothing Then
            For Each x In m_table(i)
                Console.WriteLine(x)
            Next
        End If
    Next
End Sub

' run the entire sort, but just print out the first 100 for verification purposes
Private Sub test()
    m_operations = 0
    generateData()
    Console.WriteLine("Time started = " & Now.ToString())
    sortData()
    Console.WriteLine("Time finished = " & Now.ToString & " Number of operations = " & m_operations.ToString())
    ' print out a random 100 segment from the sorted array
    Dim start As Integer = m_randomGen.Next(0, MAX_SIZE - 101)
    printData(start, start + 100)
End Sub

Sub Main()
    test()
    Console.ReadLine()
End Sub

End Module Time started = 6/15/2009 4:04:08 PM Time finished = 6/15/2009 4:04:08 PM Number of operations = 100000 21429 21430 21430 21431 21431 21432 21433 21435 21435 21435 21436 21437 21437 21439 21441 ...

Larry Watanabe
I don't think any other method will beat the time of < 1 sec for 100,000 elements sorted :)
Larry Watanabe
A: 

You can do

l.reverse()

Script ee.py:

l = []
for i in xrange(100000):
    l.append(i)

l.reverse()

lyrae@localhost:~/Desktop$ time python ee.py

real    0m0.047s
user    0m0.044s
sys    0m0.004s
lyrae
+1  A: 

First of all, for the purpose of this reply, I'm assuming - since you claim it yourself - that you're only doing this to benchmark different languages. So I won't go into "bubble sort is just slow" territory. The real question is why it's so much slower in Python.

The answer is that Python is inherently much slower than C++ or even Java. You don't see it in a typical event-driven or I/O-bound application, since there most time is spent either idling while waiting for input, or waiting for I/O calls to complete. In your case, however, the algorithm is entirely CPU bound, and thus you are directly measuring the performance of Python bytecode interpreter. Which, by some estimates, is 20-30x slower than executing the corresponding native code, which is what happens with both C++ and Java.

In general, any time you write a long-running CPU-bound loop in Python, you should expect this kind of performance. The only way to fix this is to move the entire loop into C. Moving just the body (e.g. using NumPy) won't help you much, since loop iteration itself will still be executed by Python intepreter.

Pavel Minaev