views:

281

answers:

4

My current method allows me to determine the most accurate array but I cannot figure out a good way to display informative results.

Here’s my situation …

I compare X amount of integer arrays to a static integer array. For each position in the array I calculate the position’s accuracy result by comparing to the equivalent position in the static array. After the array’s last position accuracy result has been determined I store the sum of all accuracy results for that array for comparison at a later time.

Once each array’s sum of all accuracy results has been saved they are compared to one another. The array with the lowest sum is deemed the most accurate.

Pseudo code …

foreach (ComparableArray as SingleArray) {
    for (i = 0; i < count(SingleArray); i++) {
        AccuracyResults[SingleArray] += |StaticArray[i] - SingleArray[i]| / CONSTANT;
    }   
}   
BestArray = AscendingSort(AccuracyResults)[0];

Accuracy is determined by taking the absolute value of the difference of the SingleArray value from the StaticArray and dividing by some constant. If accuracy result is < 1, then the result is deemed accurate. If result > 1, then it is inaccurate and results = 0 are perfect.

Here's a scenario ... let's use two arrays for simplicity

S = [ 56, 53, 50, 64 ]

A = [ 56, 54, 52, 64 ]

B = [ 54, 52, 51, 63 ]

Looping through each array starting with A.

Compare position [1] of A(56) and S(56) for accuracy. Determine accuracy (I'll use two for my constant) |56-56|=0, 0 / 2 = 0; Perfect accuracy

Continue to compare each position and compute accuracy |53-54|=1, 1 / 2 = 0.5; Accuracte because <= 1

|50-52|=2, 2 / 2 = 1; Accurate

|64-64| = 0; Perfect

Now compute the sum of all accuray results for array A 0 + 0.5 + 1 + 0 = 1.5

If we do the same operations for array B the final result will be 1 + 0.5 + 0.5 + 0.5 = 2.5

Now if we compare array A to B we can see that array A is more accurate than B because the sum is lower.

The problem is 1.5 and 2.5 are not very meaningful when trying to display how much more accurate A is to B.

What would be the best method to display these results? I thought about displaying percentages … such as A is 17% better than B. Or the BestArray is 6% better than average.

How would I compute those results?

Do you see any logic problems in my way of computing accuracy or know of a better way?

Thanks for any insight you can provide!

+1  A: 

Relative percentages are a bad idea, because people are very bad at judging what that means in practice - for more explanation, see the book Bad Science.

Just display the sums in order from most accurate to least and explain the rating system. I don't think turning them into any sort of percentage is helpful, but it would be a good idea to give some guide figures or banding (say by colouring the text or background) of what good, middling and poor accuracy would be.

Finally, your question is very specific to your programming program and is unlikely to be of use to many other people the way it is phrased. Here we prefer question to be specific in technical topic but generally applicable to other problems, so if you phrase your problems more generally next time it makes for a better resource.

Martin
@Martin Thanks for the insight. There is actualy color coding system implemented for results based upon it's accuracy. Problem with displaying the results is that the sums can become large and wide in range and is probably equivilant to displaying percentages. I would like to think there is a better way.Thanks for the info on the book, I'll browse the site also.
Cody N
+1  A: 

Hi

I tend to agree with @Martin that using numerical values to quantify the difference between qualitative measurements is a bit dodgy. However, people do it all the time, so if you want to carry on doing it go right ahead !

Now, what I really wanted to write is that your pseudo-code is not terribly pseudo- at all. Here's the pseudo-code that I would write:

ManhattanDistance[{56, 53, 50, 64},{56, 54, 52, 64}]

which specifies the same calculation as your version. Now, you may or may not recognise this to be a valid Mathematica statement, but that's beside the point. The point is that you have hit upon one of a myriad functions for measuring the distance between two vectors. Other distance measures include the Euclidean distance, and the Chessboard distance.

You could also use any one of a number of vector norms for measuring the distance between your vectors. For example, Mathematica gives the result sqrt(5) for the calculation:

Norm[S - A]

So, if you do want to indulge in some dodgy pseudo-statistics Google around for some definitions of vector distances and norms. I guess you'll find code or at least imperative algorithms too.

Regards

Mark

PS Don't tell anyone I helped you with pseudo-science :-)

High Performance Mark
Wow, this is awesome input. This will lead to great learning experiences. I've decided to take yalls advice and contemplate a more elegant solution. In this case Google and research papers. All elements will deal with weather related variables. The case above was temperatures.
Cody N
+1  A: 

Your "position accuracy" is just an error which if normally distributed (as one would expect) can be modeled with a gaussian distribution. If so, since sums of gaussian random variables are themselves gaussian, your "sum of all accuracy" number is also a gaussian distributed random variable. You can compute a mean and variance of these error sums and have a gaussian PDF (probability distribution function) modeling your system and use it to answer questions like "that last clunky vector should be bright red because it had an error sum larger than 95% of all such vectors". Or "wow that last vector was A+ because it had an error less than 1% of all other such vectors".

This wiki post may help too.

Paul

Paul
A: 

Mean Squared Error is often used in engineering circles to quantify error between a solution and an estimate of the solution.

To avoid problems with a large variance in the values consider using log(error) ...of course this has it's own issues with log(0) being -infinity and if (0 < error < 1) log gives negative numbers

petantik