views:

74

answers:

6

I'am trying to measure the performance of a computer vision program that tries to detect objects in video. I have 3 different versions of the program which have different parameters. I've benchmarked each of this versions and got 3 pairs of (False positives percent, False negative percent).

Now i want to compare the versions with each other and then I wonder if it makes sense to combine false positives and false negatives into a single value and use that to do the comparation. for example, take the equation falsePositives/falseNegatives and see which is smaller.

+1  A: 

It depends on how much detail you want in the comparision.

Combining the two figures will give you an overall sense of error margin but no insight into what sort of error so if you just want to know what is "more correct" in an overall sense then it's fine.

If, on the other hand, you're actually wanting to use the results for some sort of more in depth determination of whether the process is suited to a particular problem then I would imagine keeping them seperate is a good idea. e.g. Sometimes false negatives are a very different problem to false positives in a real world setting. Did the robot just avoid an object that wasn't there... or fail to notice it was heading off the side of a cliff?

In short, there's no hard and fast global rule for determining how effective vision based on one super calculation. It comes down to what you're planning to do with the information that's the important bit.

lzcd
+1  A: 

You need to factor in how "important" false positive are relative to false negatives.

For example, if your program is designed to recognise people's faces, the both false positives and false negatives are equally harmless and you can probably just combine them linearly.

But if your program was designed to detect bombs, then false positives aren't a huge deal (i.e. saying "this is a bomb" when it's actually not) but false negatives (that is, saying "this isn't a bomb" when it actually is) would be catastrophic.

Dean Harding
Ok, so it does makes sense. Is there any defined parameter to combine these two values???
dnul
+1  A: 

Well, one conventional way is to assign a weight to each of the two event types (e.g., some integer to indicate the relative significance of each to model validation). Then,

  • multiply each instance by the appropriate weighting factor;

  • then square them;

  • sum the terms;

  • take the square root

This leaves you with a single number--something "total error".

doug
great! can you reference me to some paper where they do this?
dnul
+1  A: 

A couple of other possible solutions:

-Your false-positive rate (fp) and false-negative rate (fn) may depend on a threshold. If you plot the curve where the y-value is (1-fn), and the x-value is (fp), you'll be plotting the Receiver-Operator-Characteristic (ROC) curve. The Area Under the ROC Curve (AUC) is one popular measure of quality.

-AUC can be weighted if there are certain regions of interest

-Report the Equal-Error Rate. For some threshold, fp=fn. Report this value.

+3  A: 

In addition to the popular Area Under the ROC Curve (AUC) measure mentioned by @alchemist-al, there's a score that combines both precision and recall (which are defined in terms of TP/FP/TN/FN) called the F-measure that goes from 0 to 1 (0 being the worst, 1 the best):

F-measure = 2*precision*recall / (precision+recall)

where

precision = TP/(TP+FP)  ,  recall = TP/(TP+FN)
Amro
+1  A: 

If you want to maximize both the true positives and the true negatives you can use the Diagnostic Efficiency:

Diagnostic Efficiency = Sensitivity * Specificity

Where...

Sensitivity = TP / (TP + FN)

Specificity = TN / (TN + FP)

(TP = number of true positives, FN = number of false negatives, TN = number of true negatives, FP = number of false positives)

This metric works well for datasets that have an unbalanced number of classes (i.e. the dataset is skewed)

Reason Enough