ansaurus

Question

What is the fastest sort algorithm for 0-65535 integers?

Answer 1

+12 A:

I'd use a radix sort, since you need to group the output.

FlySwat 2008-11-12 19:18:22

A radix sort module can be found on CPAN @ http://search.cpan.org/dist/Sort-Radix/

draegtun 2008-11-12 19:48:01

Answer 2

+5 A:

I was just going to say radix sort, http://en.wikipedia.org/wiki/Radix_sort however that could be a bit above what you were looking to implement, Introsort is generally the accepted sorting solution for data http://en.wikipedia.org/wiki/Introsort, it's a variation of quicksort that switches to heapsort when it reaches smaller sets as it's faster on smaller sets than quicksort.

Chris Marisic 2008-11-12 19:22:40

Answer 3

A:

If you use the number as an index to an array, and then increment the count of that position, you've "grouped" them, and done it in one pass.

in pseudocode:

while(morenumbers)
  sorted[[unsorted[number]]++
  number++

If the range is known ahead of time, you can reduce index the values (for example, the value-30000 to bring it into the right range).

warren 2008-11-12 19:23:09

Bad idea, since the range is much greater than the number of integers (50 million vs 65 thousand), so this "one pass" will be very slow.

Rafał Dowgird 2008-11-12 19:29:02

You can't get better than one pass, as you must hit each element in the unsorted list at least once in any sorting algorithm that exists.Perl code would look more likemy @sorted_values;foreach my $element (@unsorted_values) { $sorted_values[$element]++;};

Max Lybbert 2008-11-12 19:34:00

Aargh! I put in line breaks to avoid getting Perl code looking that bad as a one-liner!

Max Lybbert 2008-11-12 19:34:31

I tried doing something like this at first, but well, i can't really afford 300MB of RAM use, as that would make it inacceptably small. However i realized that i can make a number of buckets like that and thus keep the ram use sensible.

Mithaldu 2008-11-12 19:54:07

"You can't get better than one pass" - one pass over a 50 million element array is much worse than 4 passes over 65536 integers. And you have to pass over the counting array at least once.

Rafał Dowgird 2008-11-12 20:26:48

@Rafal - regardless of how many items you have, you must make at least one pass. You never "pass" over the counting array. I would love to see a way to sort a group without hitting every item at least once - if you have one, please share it. Until then, you have to hit each item at least once

warren 2008-11-13 02:40:01

@warren: I agree that you have to hit every item in the input at least once. But how can you do a countsort without passing over the counting array? How do you initialize the array? How do you output the sorted integers (or detect the gaps, in this particular case)?

Rafał Dowgird 2008-11-13 08:38:31

> How do you initialize the array?The array is initialized automatically by Perl.The original question asked how to sort the array, not how to output it. The original question also said "RAM usage is irrelevant and speed only is important," but apparently there was a limit to RAM as well.

Max Lybbert 2008-11-13 18:08:39

> The original question asked how to sort the array, not how to output it. Sort it for the particular purpose of finding gaps. You have to iterate over the whole range to find gaps. Also, zeroing 200 MB isn't free, even if done automatically by Perl.

Rafał Dowgird 2008-11-13 22:23:00

@Rafal - this is still an O(n) sort - since the list of items will be longer than the counted list; technically it's O(n+m), but that still reduces to O(n)

warren 2008-11-14 16:10:13

The only solution would be to test this.

Max Lybbert 2008-11-14 23:58:20

Answer 4

+16 A:

It's unlikely that you'll be able to write a sort algorithm in Perl that will perform better than Perl's builtin sort function:

@numbers = sort {$a <=> $b} @numbers;

You can experiment with the sort pragma to see if a particular algorithm is better:

use sort '_quicksort';
use sort '_mergesort';

Since your cutpoints will vary depending on the data distribution, I think you need to sort the entire list first then loop over it to do the cutting.

my $prev  = shift @numbers;  # already sorted
my @group = [$prev];
my $i     = 0;

foreach my $n (@numbers) {
    $i++ if ($n - $prev > 65535);
    push @{$group[$i]}, $n;
    $prev = $n;
}

Michael Carman 2008-11-12 19:27:02

Thanks for reminding me of the Perl sort function. I'd all forgotten about it.

Mithaldu 2008-11-12 19:52:06

Answer 5

+1 A:

I would try this:

my @sorted = map { unpack "N" } sort map { pack "N" } @unsorted;

Leon Timmermans 2008-11-12 19:50:24

I'm afraid map is a bit of black magic to me. What does that piece of code do? oO

Mithaldu 2008-11-12 19:56:44

I assume the map{} is to eliminate the need for a sortsub to get a numeric comparison. The {$a <=> $b} case has been optimized since 5.6.1 so the trickery shouldn't be necessary.

Michael Carman 2008-11-12 20:10:41

You have to read this from right-to-left. The map { pack "N" } @unsorted applies pack "N" to each element--turning each element into a big-endian binary number--the output gets passed to sort which has O(n log n) behavior, and each element of the sorted result gets unpacked and assigned to @sorted.

Max Lybbert 2008-11-13 18:16:21

@michael: didn't know that. Interesting.

Leon Timmermans 2008-11-14 02:55:15

Answer 6

+13 A:

I'd just make an array of buckets before running the algorithm, one for each group of 65536 consecutive values. The buckets will contain a min and max value of their contents, but won't store the contents themselves. After running the algorithm, do a single pass over the buckets. If there are two consecutive non-empty buckets with min(bucket2)-max(bucket1) < 65536, combine them. Combining will not happen until the algorithm finishes running. Discard any empty buckets. This algorithm is linear time.

Take note of Bucket Sort.

Brian 2008-11-12 19:50:46

You managed to summarize the problems really well. I've actually, while reading the replies here, pondered doing something like that, but wasn't too sure on it yet. Thanks. :)

Mithaldu 2008-11-12 19:55:36

I just editted my answer and scrapped a bit of unrelated text, based on your edits. The resultant answer should be much faster, though both were linear time algorithms.

Brian 2008-11-12 19:59:48

ansaurus

tags:

views:

answers:

What is the fastest sort algorithm for 0-65535 integers?

related questions