views:

1804

answers:

14

If I need to sort 150,000 entries what would be the fastest way for a beginner in C++ to sort that data? All I have learned so far is bubble sort and I know that isn't that efficient.

thanks nmr

+4  A: 

The quicksort algorithm is much more efficient than the bubble sort.

Robert Harvey
" ... for randomly ordered data sets."
paxdiablo
Invert the set. Try quicksort then.
lb
I like bucket sort.
Martin York
A: 

I tempt to say quicksort, but what does the array contains?

Edit

If all you want is the median value, it will be faster iterating through the entire array: O(n)

Any sorting will be longer than that, quicksort for example os O(nlogn)

Am
the array would contain an integer and need to be sorted so I could output the median of the array
nmr
err the array would contain integers
nmr
+2  A: 
#include <algorithm>
#include <iostream>
#include <vector>

int main() {
  using namespace std;

  vector<int> data;
  data.push_back(5);
  data.push_back(3);
  data.push_back(42);

  sort(data.begin(), data.end());

  for (vector<int>::size_type i = 0; i < data.size(); ++i) {
    cout << data[i] << ' ';
  }
  cout << '\n';

  // iterators are more common, and you'll love them! (eventually)
  for (vector<int>::const_iterator i = data.begin(); i != data.end(); ++i) {
    cout << *i << ' ';
  }
  cout << '\n';

  return 0;
}

There are many references online.

Roger Pate
+1 for suggesting use of STL.
Peter
+5  A: 

use std::sort()

You can refer Comparison of algorithms, to know different sorting algorithms and their complexity.

I would tend to use quicksort. For my production code, I use std::sort ( at least in VC6 STL, std::sort uses quicksort internally)

aJ
+4  A: 

The fastest, both in terms of performance and programming time, would be to use the library's sort function.

If your goal is to implement the sort yourself, you could go with a Shell sort, which is said to be quick and pretty efficient. If the keys all have similar lengths, maybe a radix sort.

More info on both can be found by Googling, probably the most useful references are in Wikipedia.

Carl Smotricz
+1 for shell sort
stimms
+29  A: 

There is no such thing as a "fastest" sorting algorithm. Like most things in computer-science, when you pick an algorithm you accept certain trade-offs.

That being said, for general purposes, quicksort is likely to be the fastest sorting algorithm for in-memory sorts in most cases, although in certain cases it is slower than other sorts, especially if your data-set is mostly sorted already.

Again, there are always trade-offs depending on your requirements and data set. For example, merge-sort is usually a bit slower than quicksort, but unlike quicksort it is stable (i.e. same-value keys retain their original order.) Insertion sort is likely to be faster than quicksort for very small data sets. Radix sort may also be faster than quicksort if you are sorting integers. And for extremely large data sets where the data is so large that it doesn't fit in memory, external merge sort would be faster than quicksort.

Again, it's all about tradeeoffs. But the best answer for general purposes is probably quicksort. For that reason, most implementations of the C++ standard library std::sort function use quicksort (or some quicksort hybrid algorithm, like introsort) internally.

Charles Salvia
Actually, for quicksort the worst case sequence depends on how a pivot is chosen; with median of 3 (which I thought is what stl/std usually use), a mostly sorted sequence should be equivelant to most random sequences.
CoderTao
But if your data is mostly sorted, mergesort is likely to beat quicksort.
Charles Salvia
That depends on implementation. Mergesort has the disadvantage that it can't be done inplace; that whole O(n) memory overhead thing. But it does have the advantage that it is gauranteed to run in O(n*log(n))
CoderTao
FYI : (same-value keys retain their original order) is usually called a "Natural" sort.
dar7yl
No, it's called a stable sort.
Charles Salvia
You are probably correct in this age, but in the 70's before it was co-opted by DOS, we called it Natural Sort.BTW, back then, the sort-de-jour was poly-phase tape sort.
dar7yl
+2  A: 

In general, quicksort will be the fastest, but it really depends on the data set. For small data sets, quicksort can have more overhead than other algorithms.

Bernard Chen
Also in some cases quicksort can be quite slow (with O(n^2) complexity). It's just quick most of the time. :)
joemoe
+8  A: 

As a beginner, I'd try them all! Truly.

This will give you both practice with C/C++ but also give you a feel for the fact that

each sort algorithm has its benefits and drawbacks...


...including the Bozo sort! , although that one's advantages is mostly to be fun and of occasional use in the context of unrelated stochastic algorithms.

Without going to the extreme and reading TAOCP Vol #3 (Donald Knuth), maybe start with this Comparative list of sort algorithms.
(BTW, if you get serious about programming, reading TAOCP, should be part of your long term plan; one of these things to do!)

The list includes

Bubble sort     
Cocktail sort   
Comb sort   
Gnome sort  
Selection sort  
Insertion sort  
Shell sort  
Binary tree sort
Library sort    
Merge sort  
In-place merge sort
Heapsort    
Smoothsort  
Quicksort   
Introsort   
Patience sorting
Strand sort     
Tournament sort
mjv
that is a very good point of view and I will definitely try to tinkerthanks
nmr
But you still should read TAOCP at some point!
Roger Pate
@R. Pate: Yes, you bet! I'll edit with this great suggestion.
mjv
Cocktail sort! I must have missed that one in my Knuth book.
Martin York
@Martin Y, actually, Knuth's got it too; it is not overly famous because this is one of many "twists" on bubble sort. This said we should have it more often, and maybe after a few of these, we can be ready for some bozo sort (http://www.itl.nist.gov/div897/sqg/dads/HTML/bozoSort.html)
mjv
A: 

Actually, a modified (bi-directional) bubble can be extremely efficient, such as in the case where the data set is already sorted before adding one element at the end. I've actually used it in one application where a built-in sort wasn't available and that sort was the quickest to implement. Bubble sort will, in that case, execute as if it were an O(n) algorithm, based on n being the number of items added.

That's because the efficiency of an algorithm depends entirely on the properties of the data being sorted. The extreme case can be seen if the data is already sorted - bubble sort makes one pass of the data with zero swaps.

So asking which is the fastest is not really valid. You should ask which will have the best behavior for random unsorted data sets.

Keep in mind that sort performance doesn't really matter that much if you're only sorting 100 items. The difference between 0.1 seconds and 1 second is not that relevant (unless you're sorting multiple times per second in which case you'd be better off finding a better way to manage your data). The performance only really becomes an issues once the data sets get large.

I would say that, if you have a built-in sort in whatever environment you're working in, just use that. If you have to implement your own, figure out first the likely disposition of your data sets then choose based on that.

paxdiablo
A: 

If you 1) can't make the assumption that the data is partially sorted, 2) can't use std::sort, or 3) want to understand the algorithm as opposed to just using a boxed one, here is an excellent performing sort for indexable collections (random access iterator in STL terms).

People like to criticize the quicksort for poor performance with certain inputs, especially when the user has control of the input. The following approach yields performance matching midpoint selection but expected complexity exponentially approaches O(n log n) as the list grows in size.

  • Initialize the quicksort with a random positive integer I. That value will be used throughout the sorting process (don't have to generate multiple numbers).
  • Pivot is selected as I mod SectionSize.

For additional performance, you should always switch your quicksort to shell sort for "small" list segments - I've seen lengths from 15-100 chosen as the cutoff.

280Z28
A: 

I am not terribly familiar with sorting algorithms, but since no one else has posted this site, I figured I would.

http://www.sorting-algorithms.com/

It has an incredible amount of detail, yet is still accessible to someone that may not be too familiar with all of the concepts. Not to mention that the animations are great for seeing the algorithm in action - something that you normally do not get to see. I hope this helps!

Joey Tiell
A: 

If it's 150,000 integers, and they are in the range of say, 0 to 10, then tallying them in an array could be pretty quick.

wisty
A: 

It has limitations, but I believe radix sort has the best worst case time. Most other sorts are comparison sorts, and as such cannot do better than O(n log n).

redmike
A: 
DigitalRoss