In reference to fastest sort of fixed length 6 int array, I do not fully understand how this sorting network beats an algorithm like insertion sort.
Form that question, here is a comparison of the number of CPU cycles taken to complete the sort :
Linux 32 bits, gcc 4.4.1, Intel Core 2 Quad Q8300, -O2
- Insertion Sort (Daniel Stutzbach) : 1425
- Sorting Networks (Daniel Stutzbach) : 1080
The code used is as follows :
Insertion Sort (Daniel Stutzbach)
static inline void sort6_insertion_sort_v2(int *d){
int i, j;
for (i = 1; i < 6; i++) {
int tmp = d[i];
for (j = i; j >= 1 && tmp < d[j-1]; j--)
d[j] = d[j-1];
d[j] = tmp;
}
}
Sorting Networks (Daniel Stutzbach)
static inline void sort6_sorting_network_v1(int * d){
#define SWAP(x,y) if (d[y] < d[x]) { int tmp = d[x]; d[x] = d[y]; d[y] = tmp; }
SWAP(1, 2);
SWAP(0, 2);
SWAP(0, 1);
SWAP(4, 5);
SWAP(3, 5);
SWAP(3, 4);
SWAP(0, 3);
SWAP(1, 4);
SWAP(2, 5);
SWAP(2, 4);
SWAP(1, 3);
SWAP(2, 3);
#undef SWAP
}
I understand that sorting networks are really good for sorting in parallel, because some of the steps are independent of the other steps. But here we are not using the parallelization.
I expect it to be faster, as it has the advantage of knowing the exact number of elements beforehand. Where and why exactly does insertion sort make unnecessary comparisons?
EDIT1:
This is the input set these codes are compared against:
int d[6][6] = {\
{1, 2, 3, 4, 5, 6},\
{6, 5, 4, 3, 2, 1},\
{100, 2, 300, 4, 500, 6},\
{100, 2, 3, 4, 500, 6},\
{1, 200, 3, 4, 5, 600},\
{1, 1, 2, 1, 2, 1}\
};\