



I'm not sure if this is quite the right place, but it seems like a decent place to ask.

My current job involves manual analysis of large data sets (at several levels, each more refined and done by increasingly experienced analysts). About a year ago, I started developing some utilities to track analyst performance by comparing results at earlier levels to final levels. At first, this worked quite well - we used it in-shop as a simple indicator to help focus training efforts and do a better job overall.

Recently though, the results have been taken out of context and used in a way I never intended. It seems management (one person in particular) has started using the results of these tools to directly affect EPR's (enlisted performance reports - \ it's an air force thing, but I assume something similar exists in other areas) and similar paperwork. The problem isn't who is using these results, but how. I've made it clear to everyone that the results are, quite simply, error-prone.

There are numerous unavoidable obstacles to generating this data, which I have worked to minimize with some nifty heuristics and such. Taken in the proper context, they're a useful tool. Out of context however, as they are now being used, they do more harm than good.

The manager(s) in question are taking the results as literal indicators of whether an analyst is performing well or poorly. The results are being averaged and individual scores are being ranked as above (good) or below (bad) average. This is being done with no regard for inherent margins of error and sample bias, with no regard for any sort of proper interpretation. I know of at least one person whose performance rating was marked down for an 'accuracy percentage' less than one percentage point below average (when the typical margin of error from the calculation method alone is around two to three percent).

I'm in the process of writing a formal report on the errors present in the system ("Beginner's Guide to Meaningful Statistical Analysis" included), but all signs point to this having no effect.

Short of deliberately breaking the tools (a route I'd prefer avoiding but am strongly considering under the circumstances), I'm wondering if anyone here has effectively dealt with similar situations before? Any insight into how to approach this would be greatly appreciated.

Update: Thanks for the responses - plenty of good ideas all around.

If anyone is curious, I'm moving in the direction of 'refine, educate, and take control of interpretation'. I've started rebuilding my tools to try and negate or track error better and automatically generate any numbers and graphs they could want, with included documentation throughout (while hiding away as obscure references the raw data they currently seem so eager to import to the 'magical' excel sheets).

In particular, I'm hopeful that visual representations of error and properly created ranking systems (taking into account error, standard deviations, etc.) will help the situation.

+11  A: 

Either modify the output to include error information (so if the error is +/- 5 %, don't output 22%, output 17% - 27%), or educate those whom this is being used against to the error so that they can defend themselves when it is used against them.

I do include errors that are able to be calculated, but some are literally immeasurable, and can only be subjectively applied. As an added 'bonus', the errors I do include are effectively stripped away anyway.
@pdehaan Then all you are left with is educating those on the receiving end of this abuse. Or as others have said, refuse to contribute to this project and/or leave the position.
+4  A: 

All you can do is to try and educate the managers as to why what they're doing is incorrect.

Beyond that, you can't stop idiots from being idiotic, and you'll just go mad trying.

I definitely wouldn't "break" code that people are relying on, even if it's not a specific deliverable. That will only cause them to complain about you, a move which may affect your own EPR :-)

+2  A: 

The problem is that the code is not yours, it belongs to your company. They really can do whatever they want with it.

I hate to say this, but if you have an issue with the ethics of your company you will have to leave that company.

Ever heard of an "open door" policy? While I've had varying success appealing to a manager's manager, it is probably worth a try.
Carter Galle
+4  A: 

I really think the key here is good communication with your managers.

Besides, I like PatrickV's idea. You could also try some other ways to engineer your tool around the problem so that it'll seem silly/be hard to use it as performance measurement - change the name of the statistics to mean something other than "how good programmer X is", make it hard to get data per-person, show error statistics.

You can also try to display the data in another way (this may actually make your managers think you are trying to help them). Show a graph - a several pixels difference in position may be harder to identify than a numeric results (my guess - your managers are using excel and coloring red everything below average). Draw the error margin so it doesn't make sense to obsess over fractions of percentages. Give the result as a scale - low and high margin that take into account your error information, it is harder to compare.

Edit: Oh yeah, and read about "social interfaces". You can start with's Spolsky's Not Just Usability and Building Communities with Software.

+3  A: 

I would echo @paxdiablo's advice, as a first step:

  1. Work on the report on the inherent errors. In fact, make it the introduction to every copy generated.
  2. When you refer to the measurement errors, indicate they are the lower limit of the errors (unless there actually aren't any).
  3. Try to educate the manager(s) in the error of his/her ways.
  4. If possible, discuss the issue with your manager. And perhaps with the offending managers' management, depending on how familiar you are with them you probably limit it to just "express some concerns" and giving a heads-up.
  5. Consult your HR department, or whomever is in charge of fairness in the performance reviews.

Good luck.

Carter Galle
+6  A: 

Well, you seem to have run afoul of the Law of Unintended Consequences in the context of human behavior.

Unfortunately, once the cat is out of the bag, it's pretty hard to put back in. You have a few options (which are not mutually exclusive, by the way) to consider, including:

  1. Alter the reports so that their data can no longer be abused in the way you describe.
  2. Work with management to help them understand why their use of your data is improper or misleading.
  3. Work with those whose performance is being measured to pressure management to rethink their policy on the matter.
  4. Work with management/analysts to come up with a viable means to measure performance in a way that is fair to everyone.
  5. Break the report in a manner that makes them unusable for any purposes.

Clearly there is a desire on the part of management to get analytics on performance of analysts. Likely there is a real need for this ... and your reports happened to fill a void in the available information. The best option for everyone would be to find a way to effectively and fairly fill this need. There are many possible ways to achieve this - from dropping dense rankings in favor of performance tiers to using time-over-time variance to refine performance measurements.

Now, it's entirely possible that the existing reports you've provided simply cannot be applied in a fair and accurate manner to address this problem. In which case, you should work with your management team to make sure they understand why this is the case - and either redefine the way performance is measured or take the time to develop an appropriate and fair methodology.

One of the strongest means to convince management that their (ab)use of the data in your report is unwise is to remind them of the concept of perverse incentives. It's entirely possible that over time, analysts will modify their behavior in a way that results in higher rankings in performance reports at the cost of real performance or quality of results that are not otherwise captured or expressed. You seem to have a good understanding of your domain - so I would hope that you could provide specific and dramatic examples of such consequences to help make your case.

+1 for perverse incentives
Peter G.
+2  A: 

One thing you could do is implement the comparison yourself. If he really wants to check if somebody is performing significantly less than the rest, it should be tested formally as well.

Now to choose the right test is a bit tricky without knowing the data and the structure, so I can't really advise you on that one. Just take into account that if you do pairwise comparisons, or compare multiple scores against an average, that you run into the multitesting problem. A classic way of correcting is using Bonferroni. If you implement that one, you can be sure that at a certain point, noone will jump out any more. The Bonferroni correction is very conservative. Another option is using Dunn-Sidak, which is supposed to be less conservative.

The correct implementation would be an ANOVA -if the assumptions are met and the data suitable off course- with a post-hoc comparison like a Tukey Honest Significant Difference test. That way at least the uncertainty on the results is taken into account.

If you don't have a clue on which test to use, describe your data in detail on and ask for help on which test to use.


Joris Meys

Joris Meys is right. You need to look at the statistical values.

But ANOVA is an overkill. You need to look at the standard deviation. Specifically, in management (quality management), a deviation over six time standard deviation (lean six sigma) is significant.

In my personal opinion, if it exceeds 3 times stddev, then you definitely (near 100% probability) have a problem, and if it exceeds 6 times stddev, then you have a problem which needs to be handled urgently (near 400% probability - I know, I know, 100% is maximum).

You can still be a bit more lenient or not by applying the mathematical or the random sampling formula for the stddev.

Your method : 1) only goes when you have one single test, so not multiple persons. 2) only is valid in the case of a normal distribution. 3) does not take into account the consideration of the OP that he wants to PREVENT abuse of the data, not stimulate it. This technique is obviously only going to strengthen the manager in his -wrong- belief that he's using the data correctly. ANOVA is not overkill, it's the most naive approach that can actually still be defended as remotely statistic. As far as I can judge, OP doesn't even need our advice on that.
Joris Meys
+2  A: 

I just wanted to elaborate on the Perverse Incentives answer of LBushkin. I can easily see your problem extending to where analysts will avoid difficult topics for fear of reducing their score. Or maybe they will provide the same answer as earlier stages to avoid hurting a friends score, even if that is not correct. An interesting question is what happens if the later answer is incorrect - you have no truth, just successive analytic opinions - in this case I assume the first answer is marked as "incorrect", right?

Maybe presenting some of these extensions to the manager will help.

Brett Stottlemyer
The nature of our data is such that no absolute standard for 'correct' exists - we just rely on the trend that people doing this for 10+ years or having a more educated (literally- degree required) viewpoint will tend to be 'more correct'. This is actually one of the major sources of error - if an analyst tends to have their data worked over by a much more experienced one, their stats tend to drop more than usual. I've been working out a relative correction factor generated by comparing final analysis results over the same lower level analyst, but it's iffy at best.