views:

827

answers:

3
+8  Q: 

Grokking Timsort

There's a (relatively) new sort on the block called Timsort. It's been used as Python's list.sort, and is now going to be the new Array.sort in Java 7).

There's some documentation and a tiny Wikipedia article describing the high-level properties of the sort and some low-level performance evaluations, but I was curious if anybody can provide some pseudocode to illustrate what Timsort is doing, exactly, and what are the key things that make it zippy. (Esp. with regard to the cited paper, "Optimistic Sorting and Information Theoretic Complexity.")

(See also related StackOverflow post.)

+2  A: 

There is more documentation on the Python site.

John Paulett
+3  A: 

This change went through the core-libs mailing list when it went in so there is some discussion and useful links there. Here's the web rev with code review changes and also the original patch.

The comments in the code say:

Implementation note: This implementation is a stable, adaptive,
iterative mergesort that requires far fewer than n lg(n) comparisons
when the input array is partially sorted, while offering the
performance of a traditional mergesort when the input array is
randomly ordered. If the input array is nearly sorted, the
implementation requires approximately n comparisons.
Temporary storage requirements vary from a small constant for nearly sorted
input arrays to n/2 object references for randomly ordered input
arrays.

The implementation takes equal advantage of ascending and
descending order in its input array, and can take advantage of
ascending and descending order in different parts of the the same
input array. It is well-suited to merging two or more sorted arrays:
simply concatenate the arrays and sort the resulting array.
The implementation was adapted from Tim Peters's list sort for Python
TimSort. It uses techiques from Peter McIlroy's "Optimistic
Sorting and Information Theoretic Complexity", in Proceedings of the
Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474,
January 1993.

Buried in there is the very useful link to the Python implementation details, and I think that's a great place to start, followed by the code. To be incredibly high level about it, timsort improves performance by noticing runs of sorted data and taking advantage of that structure during the sort.

Alex Miller
I actually meant to link to that in my "some documentation" link. Fixed. My question was specifically a response to that document - I didn't find it to be at all useful in understanding Timsort from a pseudocode-ish level.
Yang
+2  A: 

You should look at this blog post: Visualising Sorting Algorithms: Python's timsort

alt text

kaizer.se
Thanks. This is probably pretty as close as I'll get to what I asked for. My take-away is that it does prepares blocks ('minruns') of 32 elts with insertion sort and reverse-in-place.
Yang