Imagine you have a set of five elements (A-E) with some numeric values of a measured property (several observations for each element):
A = {100, 110, 120, 130}
B = {110, 100, 110, 120, 90}
C = { 90, 110, 120, 100}
D = {120, 100, 120, 110, 110, 120}
E = {110, 120, 120, 110, 120}
First, I have to detect if there are significant differences on the average levels. So I run a one way ANOVA using the Statistical package provided by Apache Commons Math. No problems so far, I obtain a boolean that tells me whether differences are found or not.
Second, if differences are found, I need to know the element (or elements) that is different from the rest. I plan to use unpaired t-tests, comparing each pair of elements (A with B, A with C .... D with E), to know if an element is different than the other. So, at this point I have the information of the list of elements that present significant differences with others, for example:
C is different than B
C is different than D
But I need a generic algorithm to efficiently determine, with that information, what element is different than the others (C in the example, but could be more than one).
Leaving statistical issues aside, the question could be (in general terms): "Given the information about equality/inequality of each one of the pairs of elements in a collection, how could you determine the element/s that is/are different from the others?"
Seems to be a problem where graph theory could be applied. I am using Java language for the implementation, if that is useful.
Edit: Elements are people and measured values are times needed to complete a task. I need to detect who is taking too much or too few time to complete the task in some kind of fraud detection system.