If I need to sort 150,000 entries what would be the fastest way for a beginner in C++ to sort that data? All I have learned so far is bubble sort and I know that isn't that efficient.
thanks nmr
If I need to sort 150,000 entries what would be the fastest way for a beginner in C++ to sort that data? All I have learned so far is bubble sort and I know that isn't that efficient.
thanks nmr
The quicksort algorithm is much more efficient than the bubble sort.
#include <algorithm>
#include <iostream>
#include <vector>
int main() {
using namespace std;
vector<int> data;
data.push_back(5);
data.push_back(3);
data.push_back(42);
sort(data.begin(), data.end());
for (vector<int>::size_type i = 0; i < data.size(); ++i) {
cout << data[i] << ' ';
}
cout << '\n';
// iterators are more common, and you'll love them! (eventually)
for (vector<int>::const_iterator i = data.begin(); i != data.end(); ++i) {
cout << *i << ' ';
}
cout << '\n';
return 0;
}
use std::sort()
You can refer Comparison of algorithms, to know different sorting algorithms and their complexity.
I would tend to use quicksort. For my production code, I use std::sort ( at least in VC6 STL, std::sort uses quicksort internally)
The fastest, both in terms of performance and programming time, would be to use the library's sort
function.
If your goal is to implement the sort yourself, you could go with a Shell sort, which is said to be quick and pretty efficient. If the keys all have similar lengths, maybe a radix sort.
More info on both can be found by Googling, probably the most useful references are in Wikipedia.
There is no such thing as a "fastest" sorting algorithm. Like most things in computer-science, when you pick an algorithm you accept certain trade-offs.
That being said, for general purposes, quicksort is likely to be the fastest sorting algorithm for in-memory sorts in most cases, although in certain cases it is slower than other sorts, especially if your data-set is mostly sorted already.
Again, there are always trade-offs depending on your requirements and data set. For example, merge-sort is usually a bit slower than quicksort, but unlike quicksort it is stable (i.e. same-value keys retain their original order.) Insertion sort is likely to be faster than quicksort for very small data sets. Radix sort may also be faster than quicksort if you are sorting integers. And for extremely large data sets where the data is so large that it doesn't fit in memory, external merge sort would be faster than quicksort.
Again, it's all about tradeeoffs. But the best answer for general purposes is probably quicksort. For that reason, most implementations of the C++ standard library std::sort
function use quicksort (or some quicksort hybrid algorithm, like introsort) internally.
In general, quicksort will be the fastest, but it really depends on the data set. For small data sets, quicksort can have more overhead than other algorithms.
As a beginner, I'd try them all! Truly.
This will give you both practice with C/C++ but also give you a feel for the fact that
...including the Bozo sort! , although that one's advantages is mostly to be fun and of occasional use in the context of unrelated stochastic algorithms.
Without going to the extreme and reading TAOCP Vol #3 (Donald Knuth), maybe start with this Comparative list of sort algorithms.
(BTW, if you get serious about programming, reading TAOCP, should be part of your long term plan; one of these things to do!)
The list includes
Bubble sort Cocktail sort Comb sort Gnome sort Selection sort Insertion sort Shell sort Binary tree sort Library sort Merge sort In-place merge sort Heapsort Smoothsort Quicksort Introsort Patience sorting Strand sort Tournament sort
Actually, a modified (bi-directional) bubble can be extremely efficient, such as in the case where the data set is already sorted before adding one element at the end. I've actually used it in one application where a built-in sort wasn't available and that sort was the quickest to implement. Bubble sort will, in that case, execute as if it were an O(n) algorithm, based on n being the number of items added.
That's because the efficiency of an algorithm depends entirely on the properties of the data being sorted. The extreme case can be seen if the data is already sorted - bubble sort makes one pass of the data with zero swaps.
So asking which is the fastest is not really valid. You should ask which will have the best behavior for random unsorted data sets.
Keep in mind that sort performance doesn't really matter that much if you're only sorting 100 items. The difference between 0.1 seconds and 1 second is not that relevant (unless you're sorting multiple times per second in which case you'd be better off finding a better way to manage your data). The performance only really becomes an issues once the data sets get large.
I would say that, if you have a built-in sort in whatever environment you're working in, just use that. If you have to implement your own, figure out first the likely disposition of your data sets then choose based on that.
If you 1) can't make the assumption that the data is partially sorted, 2) can't use std::sort
, or 3) want to understand the algorithm as opposed to just using a boxed one, here is an excellent performing sort for indexable collections (random access iterator in STL terms).
People like to criticize the quicksort for poor performance with certain inputs, especially when the user has control of the input. The following approach yields performance matching midpoint selection but expected complexity exponentially approaches O(n log n) as the list grows in size.
I
. That value will be used throughout the sorting process (don't have to generate multiple numbers).I mod SectionSize
.For additional performance, you should always switch your quicksort to shell sort for "small" list segments - I've seen lengths from 15-100 chosen as the cutoff.
I am not terribly familiar with sorting algorithms, but since no one else has posted this site, I figured I would.
http://www.sorting-algorithms.com/
It has an incredible amount of detail, yet is still accessible to someone that may not be too familiar with all of the concepts. Not to mention that the animations are great for seeing the algorithm in action - something that you normally do not get to see. I hope this helps!
If it's 150,000 integers, and they are in the range of say, 0 to 10, then tallying them in an array could be pretty quick.
It has limitations, but I believe radix sort has the best worst case time. Most other sorts are comparison sorts, and as such cannot do better than O(n log n).