ansaurus

Question

sorting algorithm where pairwise-comparison can return more information than -1, 0, +1

Answer 1

+5 A:

You can use a modified quick sort. Let me explain on an example when you comparison function returns [-2, -1, 0, 1, 2]. Say, you have an array A to sort.

Create 5 empty arrays - Aminus2, Aminus1, A0, Aplus1, Aplus2.

Pick an arbitrary element of A, X.

For each element of the array, compare it with X.

Depending on the result, place the element in one of the Aminus2, Aminus1, A0, Aplus1, Aplus2 arrays.

Apply the same sort recursively to Aminus2, Aminus1, Aplus1, Aplus2 (note: you don't need to sort A0, as all he elements there are equal X).

Concatenate the arrays to get the final result: A = Aminus2 + Aminus1 + A0 + Aplus1 + Aplus2.

Igor Krivokon 2009-05-27 02:18:17

So in a lovely, equal problem spread world (equal hits to -2..+2 buckets) this would be a n log^4 n solution to sorting rather than a n log^2 n solution

Tom Leys 2009-05-27 03:26:22

@Tom, thats the same complexity, the log base is just like a constant multiplier.

wowest 2009-05-27 13:17:46

Also, you mean log_4 n (log to base 4), not log^4 n (which means log-n to the fourth power).

ShreevatsaR 2009-05-27 16:22:45

Answer 2

+1 A:

It seems like using raindog's modified quicksort would let you stream out results sooner and perhaps page into them faster.

Maybe those features are already available from a carefully-controlled qsort operation? I haven't thought much about it.

This also sounds kind of like radix sort except instead of looking at each digit (or other kind of bucket rule), you're making up buckets from the rich comparisons. I have a hard time thinking of a case where rich comparisons are available but digits (or something like them) aren't.

2009-05-27 02:32:54

the particular application I have in mind is where humans are actually (subjectively) providing the pair-wise comparison

James Tauber 2009-05-27 02:40:45

An interesting application. So in theory you are trying to reduce the number of comparisons to the minimum possible.

Tom Leys 2009-05-27 03:27:58

Tom, yes, reduce the number of comparisons at the expense of being only a near sort

James Tauber 2009-05-27 04:15:41

Answer 3

+1 A:

I can't think of any situation in which this would be really useful. Even if I could, I suspect the added CPU cycles needed to sort fuzzy values would be more than those "extra comparisons" you allude to. But I'll still offer a suggestion.

Consider this possibility (all strings use the 27 characters a-z and _):

            11111111112
   12345678901234567890
1/ now_is_the_time
2/ now_is_never
3/ now_we_have_to_go
4/ aaa
5/ ___

Obviously strings 1 and 2 are more similar that 1 and 3 and much more similar than 1 and 4.

One approach is to scale the difference value for each identical character position and use the first different character to set the last position.

Putting aside signs for the moment, comparing string 1 with 2, the differ in position 8 by 'n' - 't'. That's a difference of 6. In order to turn that into a single digit 1-9, we use the formula:

digit = ceiling(9 * abs(diff) / 27)

since the maximum difference is 26. The minimum difference of 1 becomes the digit 1. The maximum difference of 26 becomes the digit 9. Our difference of 6 becomes 3.

And because the difference is in position 8, out comparison function will return 3x10^-8 (actually it will return the negative of that since string 1 comes after string 2.

Using a similar process for strings 1 and 4, the comparison function returns -5x10^-1. The highest possible return (strings 4 and 5) has a difference in position 1 of '-' - 'a' (26) which generates the digit 9 and hence gives us 9x10^-1.

Take these suggestions and use them as you see fit. I'd be interested in knowing how your fuzzy comparison code ends up working out.

paxdiablo 2009-05-27 02:43:37

Answer 4

+1 A:

Considering you are looking to order a number of items based on human comparison you might want to approach this problem like a sports tournament. You might allow each human vote to increase the score of the winner by 3 and decrease the looser by 3, +2 and -2, +1 and -1 or just 0 0 for a draw.

Then you just do a regular sort based on the scores.

Another alternative would be a single or double elimination tournament structure.

Tom Leys 2009-05-27 03:31:43

I've considered doing a near-sorting first as a way of seeding a tournament structure

James Tauber 2009-05-27 04:05:47

Answer 5

A:

You can use two comparisons, to achieve this. Multiply the more important comparison by 2, and add them together.

Here is a example of what I mean in Perl. It compares two array references by the first element, then by the second element.

use strict;
use warnings;
use 5.010;

my @array = (
  [a => 2],
  [b => 1],
  [a => 1],
  [c => 0]
);

say "$_->[0] => $_->[1]" for sort {
  ($a->[0] cmp $b->[0]) * 2 +
  ($a->[1] <=> $b->[1]);
} @array;

a => 1
a => 2
b => 1
c => 0

You could extend this to any number of comparisons very easily.

Brad Gilbert 2009-05-27 15:45:11

Answer 6

A:

Perhaps there's a good reason to do this but I don't think it beats the alternatives for any given situation and certainly isn't good for general cases. The reason? Unless you know something about the domain of the input data and about the distribution of values you can't really improve over, say, quicksort. And if you do know those things, there are often ways that would be much more effective.

Anti-example: suppose your comparison returns a value of "huge difference" for numbers differing by more than 1000, and that the input is {0, 10000, 20000, 30000, ...}

Anti-example: same as above but with input {0, 10000, 10001, 10002, 20000, 20001, ...}

But, you say, I know my inputs don't look like that! Well, in that case tell us what your inputs really look like, in detail. Then someone might be able to really help.

For instance, once I needed to sort historical data. The data was kept sorted. When new data were added it was appended, then the list was run again. I did not have the information of where the new data was appended. I designed a hybrid sort for this situation that handily beat qsort and others by picking a sort that was quick on already sorted data and tweaking it to be fast (essentially switching to qsort) when it encountered unsorted data.

The only way you're going to improve over the general purpose sorts is to know your data. And if you want answers you're going to have to communicate that here very well.

dwc 2009-05-27 16:21:05

the task is a human subjectively expressing their preference for items in a collection in a pairwise fashion in order to be able to near-sort that collection by the person's preference

James Tauber 2009-05-28 08:57:06

ansaurus

tags:

views:

answers:

sorting algorithm where pairwise-comparison can return more information than -1, 0, +1

related questions