views:

278

answers:

9

I've read that qsort is just a generic sort, with no promises about implementation. I don't know about how libraries vary from platform to plaform, but assuming the Mac OS X and Linux implementations are broadly similar, are the qsort implementations recursive and/or require a lot of stack?

I have a large array (hundreds of thousands of elements) and I want to sort it without blowing my stack to oblivion. Alternatively, any suggestions for an equivalent for large arrays?

+9  A: 

Yes it's recursive. No, it probably will not use large amounts of stack. Why not simply try it? Recursion is not some kind of bogey - it's the solution of choice for very many problems.

anon
excellent answer, Butterworth :)
abelenky
Trust me, I use recursion all the time! Just never before for depths like this, hence my asking first.
Joe
@Joe Depths like what? The recursion in quicksort pushes stack frames (i.e. local variables and return addresses) to the stack, not copies of the thing being sorted. This is very little data.
anon
I'd only care if it were a PIC. +1 ;-)
jweyrich
@Neil: unless you screw up your implementation, and call-recurse on the "big half". Which obviously those sound implementations don't, but it takes a little thought to realise *why* the bound on stack usage is so small even in the pathological worst run-time of quicksort. Everyone probably uses at least Introsort for `qsort` now anyway...
Steve Jessop
I realise the nature of the data on the stack. It was more the number of frames (which, now I actually look at the numbers, isn't that great). I still defend asking the question of whether it's suitable for inputs of a given size, though. The stdlib manpage on my system doesn't mention the kinds of data over which it is suitable.
Joe
@Joe qsort would not be the sort of choice if it didn't handle very large datasets well. Nothing wrong with the question though, except I do find the reluctance of many people here to actually try things out a bit off-pissing.
anon
Entirely offtopic: [Neither is the Pope catholic, nor do bears mostly shit in the woods](http://www.guardian.co.uk/theguardian/2005/feb/08/features11.g2)
ShreevatsaR
-1: Quicksort has worst case space complexity O(n) which means that sorting a large array *can* blow the stack. If stack space is not abundant (like in a thread or coroutine), then this is something to consider.
Luther Blissett
@Luther Blisset I am talking about qsort not Quicksort. And why not take this up (or downvote) Andrey T's answer - I made no mention of the complexity of the sort.
anon
Sigh; the quip attracted quite a slew of "offensive", so edited out.
Marc Gravell
@Luther: as noted in response to another comment, you're just plain wrong on this one. The worst case is logarithmic, **not** linear, as long as the Quicksort is implemented correctly.
Jerry Coffin
@Jerry: Quicksort worst case is O(n^2) [time complexity] but case is rare if it is implemented properly. Otherwise it is O(n log n).
0A0D
@0A0D, the discussion was about space complexity, not time complexity.
Lars Wirzenius
@Lars: In which case, the worst case space complexity is `O(n)`
0A0D
@0A0D: "worst-case" means, "the worst possible input", not "the worst possible implementation". Of course it is possible to implement Quicksort so that it has O(n) worst-case space complexity. It's also possible to implement Quicksort so that it has O(n^2) space complexity (for instance in a language which lets you pass and assign arrays by value as copies). Either would be a bug, although of course the latter is a worse one.
Steve Jessop
+9  A: 

You know, the recursive part is logn deep. In 64 levels of recursion (which is ~64*4=~256 bytes of stack total) you can sort an array of size ~2^64, ie an array as large as you can address on a 64 bit cpu, which is 147573952589676412928 bytes for 64 bit integers. You can't even hold it in memory!

Worry about stuff that matters imo.

Blindy
+1. It may be a few more bytes than 256 depending on how much is pushed on the stack for each level, but it's still a small constant.
ShreevatsaR
-1: This is wrong. Quicksort has worst case space complexity O(n), not O(log n). A large array *can* blow the stack.
Luther Blissett
@Luther: when properly implemented (when recursing, sort the smaller partition first), stack usage is limited to approximately logarithmic growth. To be exact, Knuth gives it as [lg (N+1)/(M+2)] (with "[]" signifying "floor"), where N=number of elements being sorted and M=size of partition where you stop recursing (presuming an "improved" Quicksort that switches to insertion sort when the whole thing is nearly sorted).
Jerry Coffin
Luther, qsort() isn't "Quicksort"— or rather the actual algorithm is implementation defined. Glibc's qsort() for example, switches to insertion sort in order to avoid the worst case space complexity problem.
Gmaxwell
@Jerry: Worst case space complexity is O(n). This can be confirmed at a number of websites searchable via Google at many reputable forms of higher education around the world. (i.e. http://ugweb.cs.ualberta.ca/~c115/F06/schedule/lectures/lecture12.pdf)
0A0D
http://mars.netanya.ac.il/~brdaniel/algorithm/Qsort.html
0A0D
@0A0D: that Alberta slideshow is not useful. Possibly a fine simplification for teaching purposes, but nobody actually implements the partition step by allocating two new arrays, one for each side of the pivot, and copying the elements into them. So, the analysis is not relevant to any implementation of Quicksort written by someone who knows what they're doing - part of the benefit of Quicksort is that it's an (almost) in-place algorithm.
Steve Jessop
@Steve: Even Wikipedia lists O(n) [though that doesn't say much...]
0A0D
@0A0D: http://en.wikipedia.org/wiki/Quicksort says "quicksort is an in-place sort and uses no temporary memory". Which isn't really true, but the "space complexity" section explains why it's really O(log N). One possible criterion could be, "how many times has your source implemented quicksort in a widely-distributed software system, and what was the space bound for their implementation?". In my case the answer is "once", and "proportional to log N". To teach analysis of space bounds, you don't *have* to use a decent implementation. To ship code, you do.
Steve Jessop
.. which is not to say that the academics at Alberta don't know how to write a quicksort. I'm sure they do. They just aren't teaching students "Sedgewick's improved version" (as Wikipedia calls it) in that particular lecture, so it's wrong to take their specimen implementation as typical, or even as plausible.
Steve Jessop
@0A0D: If you're going to try to appeal to authority instead of using a logic-based argument, you need to consider the source. When it comes to analysis of algorithms, who do you trust more: Duane Szafron or Donald Knuth? [As an off-topic aside, "i.e." is an abbreviation for "id est" which translates to "that is"; you seem to have wanted "e.g.", an abbreviation for "example gratis", which translates to "for example".]
Jerry Coffin
Why not just forget about whether "quick" sort could blow the stack and use the One True Sort, heap sort? No temp space and real `O(n log n)` time. :-)
R..
@Jerry: I really don't care whether it's i.e. or e.g.
0A0D
+1  A: 

With quicksort, the stack will grow logarithmically. You will need a lot of elements to blow up your stack.

Daniel Egeberg
@msw: Seeing as you insist on being pedantic, you forgot to define *N* as the size of the the array. As far as I'm concerned, the term "logarithmic growth" is generally understood to mean O(lg(*n*)) when talking about algorithms.
Daniel Egeberg
A: 

A qsort implementation which can fail on large arrays is extremely broken. If you're really worried I'd go RTFS, but I suspect any half-decent implementation will either use an in-place sorting algorithm or use malloc for temporary space and fall back to an in-place algorithm if malloc fails.

R..
+10  A: 

Here's a version from BSD, copyright Apple, presumably used in OS X at some time or another:

http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/bsd/kern/qsort.c

It is call-recursive, although the upper bound on the depth of recursion is small, as Blindy explains.

Here's a version from glibc, presumably used in Linux systems at some time or another:

http://www.umcs.maine.edu/~chaw/200801/capstone/n/qsort.c

It's not call recursive. For exactly the same reason that the limit on call-recursion is small, it can use a small fixed amount of stack to manage its loop-recursion.

Can I be bothered to look up the latest versions? Nope ;-)

For a few hundred thousand array elements, even the call-recursive implementation won't call more than 20 levels deep. In the grand scheme of things that is not deep, except on very limited embedded devices, which wouldn't have enough memory for you to have an array that big to sort in the first place. When N is bounded above, O(log N) obviously is a constant, but more than that it's normally quite a manageable constant. Usually 32 or 64 times "small" is "reasonable".

Steve Jessop
+1 for actually looking at the source code. It's interesting to note that glibc uses a quicksort/insertion sort hybrid in their qsort()
nos
@nos: IIRC that's what Knuth tells you to do, so interesting but hopefully not surprising ;-)
Steve Jessop
+1  A: 

I'd guess that most modern implementations of qsort actually use the Introsort algorithm. A reasonably written Quicksort won't blow the stack anyway (it'll sort the smaller partition first, which limits stack depth to logarithmic growth).

Introsort goes a step further though -- to limit the worst case complexity, if it sees that Quicksort isn't working well (too much recursion, so it could have O(N2) complexity), it'll switch to a Heapsort which guarantees O(N log2 N) complexity and limits stack usage as well. Therefore, even if the Quicksort it uses is sloppily written, the switch to Heapsort will limit stack usage anyway.

Jerry Coffin
+1  A: 

I remember reading in this book: C Programming: A Modern Approach that the ANSI C specification doesn't define how to implement qsort.

And the book wrote that qsort could in reality be a another kind of sort, merge sort, insertion sort and why not bubble sort :P

So, the qsort implementation might not be recursive.

Good standards don't describe how to implement anything - they do though for things like sorts specify minimum complexity guarantees which may limit the choice of algorithm for implementations.
anon
@Neil: regardless of what good standards do, as it happens the C standard doesn't specify the complexities of `qsort` and `bsearch`. Fortunately the question is about two implementations in particular, so the standard is pretty much irrelevant. If Apple is going to perversely switch OS X to Bogosort in the next release, then whether they get away with that will not depend on whether it breaks the C standard...
Steve Jessop
+4  A: 

A properly implemented qsort does not require more than log2(N) levels of recursion (i.e. depth of stack), where N is the largest array size on the given platform. Note that this limit applies regardless of how good or bad the partitioning happens to be, i.e. it is the worst case depth of recursion. For example, on a 32-bit platform, the depth of recursion will never exceed 32 in the worst possible case, given a sane implementation of qsort.

In other words, if you are concerned about the stack usage specifically, you have nothing to worry about, unless you are dealing with some strange low-quality implementation.

AndreyT
A: 

The worst-case space-complexity of a naive quicksort implementation (which is still a popular option for qsort) is O(N). If the implementation is modified to sort the smaller arary first and tail-recursion optimisation or an explicit stack and iteration is used then the worst case space can be brought down to O(log N), (what most answers here wrote already). So, you will not blow up your stack if the implementation of quick-sort is not broken and the library was not broken by improper compiler flags. But, for example, most compiler which support tail recursion elimination won't do this optimization it in unoptimized debug builds. A library built with the wrong flags (i.e. not enough optimization, for example in the embedded domain where you sometimes build your own debug libc) might crash the stack then.

For most developers, this will never be an issue (they have vendor tested libc's which have O(log N) space complexity), but I'd say it is a good idea to have an eye on potential library issues from time to time.

UPDATE: Here's an example for what I mean: A bug in libc (from 2000) where qsort would start thrashing virtual memory because the qsort implementation would switch internally to mergesort because it though there is enough memory to hold a temporary array.

http://sources.redhat.com/ml/libc-alpha/2000-03/msg00139.html

Luther Blissett
Questioner is asking about particular systems, which have reasonable quality of implementation. "naive quicksort implementation is still a popular option" is simply false. It is not popular with people who write C libraries, which is what the question concerns.
Steve Jessop
Questioner asked about "Linux". Linux has no implementation of qsort, because it's a kernel. qsort is a function the C-runtime library for which there are several options (glibc,uclibc,newlib,dietlibc..and then there's this thing they've put into Android). Also: see my update.
Luther Blissett
-1 from me: a hypothetical badly implemented qsort is pretty irrelevant. The glibc qsort is implemented quite well, and I assume the OS X one is as well.A bad implementation of qsort is a bug, which needs to be fixed.
Lars Wirzenius
@Lars: I just gave an example how glibc's qsort *was* implemented in a way you would deem hypothetical and it gave someone concrete headaches. It was of course a fixed.
Luther Blissett
+1 This is a good answer. In fact, it is along the same lines as AndreyT except Luther doesn't have over 30K rep.
0A0D
@Luther, the bug you link to is not because of a naive implementation, but a smart implementation with a bad cornercase. I see no point in assuming the stdlib implemntors are incompetent.
Lars Wirzenius
@Lars: The bad cornercase is "The machine has virtual memory".
Luther Blissett
@Luther: thanks for that party political broadcast from GNU. It may interest you to learn that in the English language, which is what Joe and I are both speaking, the word "Linux" is used as a generic term for GNU/linux-based OS distributions ;-p
Steve Jessop
And I further note that, while the GPL of course requires proper copyright attribution, it does not give RMS the right to name derivative works (and neither does he claim that right, just acts a bit grumpy when people don't do what he says). So if for example Red Hat calls "their" OS "Linux", then that is its name, regardless of whether it "should" be called GNU/Linux...
Steve Jessop
Not if Richard Stallman has his way
0A0D
@0A0D: of course, but having worked so hard for Red Hat's right not to have to do what he (the author of the software they use) wants, RMS must grudgingly accept they they sometimes won't do what he wants :-)
Steve Jessop