views:

2909

answers:

4

I think it is MergeSort, which is O(n log n).

However, the following output disagrees:

-1,0000000099000391,0000000099000427
1,0000000099000427,0000000099000346
5,0000000099000391,0000000099000346
1,0000000099000427,0000000099000345
5,0000000099000391,0000000099000345
1,0000000099000346,0000000099000345

I am sorting a nodelist of 4 nodes by sequence number, and the sort is doing 6 comparisons. I am puzzled because 6 > (4 log(4)). Can someone explain this to me?

P.S. It is mergesort, but I still don't understand my results.

Thanks for the answers everyone. Thank you Tom for correcting my math.

+15  A: 

O(n log n) doesn't mean that the number of comparisons will be equal to or less than n log n, just that the time taken will scale proportionally to n log n. Try doing tests with 8 nodes, or 16 nodes, or 32 nodes, and checking out the timing.

Andy Mikula
Ah, I see. I was assuming that it was number of comparisons. My mistake. Thanks!
Kyle Jones
That's a pretty small number of comparisons, you could comfortably multiply those by 1000 for better results.
Dana the Sane
True, but his example was with 4 initially :)
Andy Mikula
+9  A: 

You sorted four nodes, so you didn't get merge sort; sort switched to insertion sort.

In Java, the Arrays.sort() methods use merge sort or a tuned quicksort depending on the datatypes and for implementation efficiency switch to insertion sort when fewer than seven array elements are being sorted. (Wikipedia, emphasis added)

Arrays.sort is used indirectly by the Collections classes.

A recently accepted bug report indicates that the Sun implementation of Java will use Python's timsort in the future: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6804124

(The timsort monograph, linked above, is well worth reading.)

tpdi
Also mentioned in the OpenJDK Forum: Core Libraries Round Table last thursday. http://mediacast.sun.com/users/robilad/media/openjdk-core-libraries-roundup.mp3 [MP3!]
Tom Hawtin - tackline
Interesting read, thanks.
Dana the Sane
Quicksort beats merge sort. Java uses quicksort as sorting algorithm.
The constants on quicksort is really small, because it s in place. Timsort, is a hybrid of merge sort and insertion sort, which uses insertion sort at merge stage, doesnt gain too much for this. The constant is still to high compared to quicksort.
+3  A: 

An algorithm A(n) that processes an amount of data n is in O(f(n)), for some function f, if there exist two strictly positive constants C_inf and C_sup such that:

C_inf . f(n) < ExpectedValue(OperationCount(A(n))) < C_sup . f(n)

Two things to note:

  • The actual constants C could be anything, and do depend on the relative costs of operations (depending on the language, the VM, the architecture, or your actual definition of an operation). On some platforms, for instance, + and * have the same cost, on some other the later is an order of magnitude slower.

  • The quantity ascribed as "in O(f(n))" is an expected operation count, based on some probably arbitrary model of the data you are dealing with. For instance, if your data is almost completely sorted, a merge-sort algorithm is going to be mostly O(n), not O(n . Log(n)).

Varkhan
For an arbitrary sized n-bit integer, + will typically be O(n) and * IIRC O(n log n). There are "fun" things you can do to algorithms to just use huge integers and relatively few (big-O) operations.
Tom Hawtin - tackline
Quicksort is the canonical example of an algorithm which is mostly O(n log n), but can go O(n^2) (for instance sorted input with first element as pivot).
Tom Hawtin - tackline
Yes... for all those complexity estimation tasks, the devil is in the model. For it to be any use, you generally also have to take into account the worst-case complexity, not just the expected cost. And it's often a b**ch to find the worst-case...
Varkhan
A: 

I've written some stuff you may be interested in about the Java sort algorithm and taken some performance measurements of Collections.sort(). The algorithm at present is a mergesort with an insertion sort once you get down to a certain size of sublists (N.B. this algorithm is very probably going to change in Java 7).

You should really take the Big O notation as an indication of how the algorithm will scale overall; for a particular sort, the precise time will deviate from the time predicted by this calculation (as you'll see on my graph, the two sort algorithms that are combined each have different performance characteristics, and so the overall time for a sort is a bit more complex).

That said, as a rough guide, for every time you double the number of elements, if you multiply the expected time by 2.2, you won't be far out. (It doesn't make much sense really to do this for very small lists of a few elements, though.)

Neil Coffey