tags:

views:

294

answers:

6

Update: Please file this under bad ideas. You don't get anything for free in life and here is certainly proof. A simple idea gone bad. It is definitely something to learn from however.

Lazy programming challenge. If I pass a function that 50-50 returns true or false for the qsort's comparision function I think that I can effectively unsort an array of structures writing 3 lines of code.

int main ( int argc, char **argv)
{
    srand( time(NULL) );  /* 1 */
    ...
    /* qsort(....) */     /* 2 */
}

...

int comp_nums(const int *num1, const int *num2)
{
    float frand = 
          (float) (rand()) / ((float) (RAND_MAX+1.0));  /* 3 */

    if (frand >= 0.5f)
         return GREATER_THAN;
    return LESS_THAN;
}

Any pitfalls I need to look for? Is it possible in fewer lines through swapping or is this the cleanest I get for 3 non trivial lines?

A: 

Rand isn't the most random thing out there... If you want to shuffle cards or something this isn't the best. Also a Knuth shuffle would be quicker, but your solution is ok if it doesn't loop forever

Robert Gould
+2  A: 

No, this won't properly shuffle the array, it will barely move elements around their original locations, with exponential distribution.

Juliano
please elaborate...
ojblass
The proof is too long to post here. By shuffle, we understand that any possible permutation is possible with equal probability. Since your compare function returns true with 50% probability, the number of comparisons needed to return true has exponential distribution with average 1/a = 2.0. So since, in two comparisons average you get a true result, that pivoting stops quickly, near the original place where the value was.
Juliano
+10  A: 

Bad idea. I mean really bad.

Your solution gives an unpredictable result, not a random result and there is a big difference. You have no real idea of what a qsort with a random comparison will do and whether all combinations are equally likely. This is the most important criterion for a shuffle: all combinations must be equally likely. Biased results equal big trouble. There's no way to prove that in your example.

You should implement the Fisher-Yates shuffle (otherwise known as the Knuth shuffle).

cletus
qsort is really taking much longer than I expect... I think it expects the elements to be provably monotonic with respect to the comparison function....
ojblass
bah... who down voted this...
ojblass
+1  A: 

The comparison function isn't supposed to return a boolean type, it's supposed to return a negative number, a positive number, or zero which qsort() uses to determine which argument is greater than the other.

yjerem
thanks changed up the question a bit...
ojblass
Ah, but a [[ToNumber]] conversion is performed on the result turning the boolean into either 0 or 1. It's just less efficient :D
olliej
Wow, I suck, despite having read everything i am so used to encountering this in JS that i just assumed the code must be in JS(JavaScript engines actually need to perform reasonably in this scenario. It's stupid.)
olliej
Actually in C there is no boolean type, we just define TRUE as the number 1 and FALSE as 0. But with the comparison function you have to return a negative value sometimes so you can't use TRUE and FALSE.
yjerem
+5  A: 

In addition to the other answers, this is worse than a simple Fisher-Yates shuffle because it is too slow. The qsort algorithm is O(n*log(n)), the Fisher-Yates is O(n).

Some more detail is available in Wikipedia on why this kind of "shuffle" does not generally work as well as the Fisher-Yates method:

Comparison with other shuffling algorithms

The Fisher-Yates shuffle is quite efficient; indeed, its asymptotic time and space complexity are optimal. Combined with a high-quality unbiased random number source, it is also guaranteed to produce unbiased results. Compared to some other solutions, it also has the advantage that, if only part of the resulting permutation is needed, it can be stopped halfway through, or even stopped and restarted repeatedly, generating the permutation incrementally as needed. In high-level programming languages with a fast built-in sorting algorithm, an alternative method, where each element of the set to be shuffled is assigned a random number and the set is then sorted according to these numbers, may be faster in practice[citation needed], despite having worse asymptotic time complexity (O(n log n) vs. O(n)). Like the Fisher-Yates shuffle, this method will also produce unbiased results if correctly implemented, and may be more tolerant of certain kinds of bias in the random numbers. However, care must be taken to ensure that the assigned random numbers are never duplicated, since sorting algorithms in general won't order elements randomly in case of a tie. A variant of the above method that has seen some use in languages that support sorting with user-specified comparison functions is to shuffle a list by sorting it with a comparison function that returns random values. However, this does not always work: with a number of commonly used sorting algorithms, the results end up biased due to internal asymmetries in the sorting implementation.[7]

This links to here:

just one more thing While writing this article I experimented with various versions of the methods and discovered one more flaw in the original version (renamed by me to shuffle_sort). I was wrong when I said “it returns a nicely shuffled array every time it is called.”

The results are not nicely shuffled at all. They are biased. Badly. That means that some permutations (i.e. orderings) of elements are more likely than others. Here’s another snippet of code to prove it, again borrowed from the newsgroup discussion:

N = 100000
A = %w(a b c)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
  sorted = A.shuffle  
  Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
  puts "#{key}: #{Score[key]}"
end

This code shuffles 100,000 times array of three elements: a, b, c and records how many times each possible result was achieved. In this case, there are only six possible orderings and we should got each one about 16666.66 times. If we try an unbiased version of shuffle (shuffle or shuffle_sort_by), the result are as expected:

 
 abc: 16517
 acb: 16893
 bac: 16584
 bca: 16568
 cab: 16476
 cba: 16962

Of course, there are some deviations, but they shouldn’t exceed a few percent of expected value and they should be different each time we run this code. We can say that the distribution is even.

OK, what happens if we use the shuffle_sort method?

 abc: 44278 
 acb: 7462
 bac: 7538
 bca: 3710
 cab: 3698
 cba: 33314

This is not an even distribution at all. Again?

It shows how the sort method is biased and goes into detail why this is so. FInally he links to Coding Horror:

Let's take a look at the correct Knuth-Fisher-Yates shuffle algorithm.

for (int i = cards.Length - 1; i > 0; i--)
{
  int n = rand.Next(i + 1);
  Swap(ref cards[i], ref cards[n]);
}

Do you see the difference? I missed it the first time. Compare the swaps for a 3 card deck:

 
Naïve shuffle   Knuth-Fisher-Yates shuffle
rand.Next(3);    rand.Next(3);
rand.Next(3);    rand.Next(2);
rand.Next(3); 

The naive shuffle results in 3^3 (27) possible deck combinations. That's odd, because the mathematics tell us that there are really only 3! or 6 possible combinations of a 3 card deck. In the KFY shuffle, we start with an initial order, swap from the third position with any of the three cards, then swap again from the second position with the remaining two cards.

1800 INFORMATION
NOt looking for speed just fewest lines of code...
ojblass
+1  A: 

The Old New Thing takes on this one

I think the basic idea of randomly partition the set recursively on the way down and concatenate the results on the way up will work (It will average O(n*log n) binary decisions and that is darn close to log2(fact(n)) but q-sort will not be sure to do that with a random predicate.

BTW I think the same argument and issues can be said for any O(n*log n) sort strategy.

BCS