ansaurus

Question

Answer 1

+1 A:

you could try something involving hamming distance

yx 2009-05-13 18:21:22

I don't think Hamming distance is a good solution for this problem. It offers an element by element comparison but the distance between two elements does not say anything about sorting quality.

Ronald Wildenberg 2009-05-13 18:29:21

you are right, I did not say only using hamming distance, but just something that involves it. If he wants to do a more expensive estimate, he should use distance calculations.

yx 2009-05-13 18:31:39

Answer 2

+3 A:

Are you looking for finding some algorithm that calculates the difference based on array sorted with A and array sorted with B as inputs? Or are you looking for a generic method of determining on average how off an array would be when sorted with B?

If the first, then I suggest something as simple as the distance each item is from where it should be (an average would do better than a sum to remove length of array as an issue)

If the second, then I think I'd need to see more about these algorithms.

Ed Marty 2009-05-13 18:24:06

This isn't good enough, as if for example the the list is z, a, b, c, d… the whole list is shifted by 1.

Georg 2009-05-13 18:49:36

Answer 3

+2 A:

Calculating RMS Error may be one of the many possible methods. Here is small python code.

def calc_error(out_A,out_B):
        # in    <= input
        # out_A <= output of algorithm A
        # out_B <= output of algorithm B

        rms_error = 0

        for i in range(len(out_A)):
            # Take square of differences and add
            rms_error +=  (out_A[i]-out_B[i])**2 

        return rms_error**0.5   # Take square root

>>> calc_error([1,2,3,4,5,6],[1,2,3,4,5,6])
0.0
>>> calc_error([1,2,3,4,5,6],[1,2,4,3,5,6]) # 4,3 swapped
1.414
>>> calc_error([1,2,3,4,5,6],[1,2,4,6,3,5]) # 3,4,5,6 randomized
2.44

NOTE: Taking square root is not necessary but taking squares is as just differences may sum to zero. I think that calc_error function gives approximate number of wrongly placed pairs but I dont have any programming tools handy so :(.

Take a look at this question.

TheMachineCharmer 2009-05-13 18:40:41

I was thinking about RMSE too. But the original question says "sorting is expensive", so I have to assume that the error metric must be calculated without ever having a canonical sorting to compare against. And without the canonical order, you can't compute RMSE.

benjismith 2009-05-13 19:31:31

No, the OP has access to the gold standard for training purposes. He wants an error function so he can optimize his approximate sorter before turning it loose.

John Fouhy 2009-05-13 22:46:19

Answer 4

+4 A:

I would determine the largest correctly ordered sub set.

                               +-------------> I
                               |   +--------->
                               |   |
A -> B -> D ----->  E  -> G -> H --|--> J
     |             ^ |             |    ^
     |             | |             |    |
     +------> C ---+ +-----------> F ---+

In your example 7 out of 10 so the algorithm scores 0.7. The other sets have the length 6. Correct ordering scores 1.0, reverse ordering 1/n.

I assume that this is related to the number of inversions. x + y indicates x <= y (correct order) and x - y indicates x > y (wrong order).

A + B + D - C + E + G + H - F + J - I

We obtain almost the same result - 6 of 9 are correct scorring 0.667. Again correct ordering scores 1.0 and reverse ordering 0.0 and this might be much easier to calculate.

Daniel Brückner 2009-05-13 18:46:00

Answer 5

+2 A:

bubaker 2009-05-13 19:24:49

Yes This looks very much like the thing I was looking for, Thanks!

2009-05-14 14:00:33

Answer 6

+2 A:

It's tough to give a good generic answer, because the right solution for you will depend on your application.

One of my favorite options is just the number of in-order element pairs, divided by the total number of pairs. This is a nice, simple, easy-to-compute metric that just tells you how many mistakes there are. But it doesn't make any attempt to quantify the magnitude of those mistakes.

double sortQuality = 1;
if (array.length > 1) {
   int inOrderPairCount = 0;
   for (int i = 1; i < array.length; i++) {
      if (array[i] >= array[i - 1]) ++inOrderPairCount;
   }
   sortQuality = (double) inOrderPairCount / (array.length - 1);
}

benjismith 2009-05-13 19:29:30

ansaurus

tags:

views:

answers:

Sorted list difference

related questions