views:

356

answers:

13

I know the question about measuring developer performance has been asked to death, but please bear with me. I know the age old debate about how you cannot measure performance of developers, but the reality is, at our company there is a "need" to do that one way or another.

I work for a relatively small company (small in terms of developers), and management felt the need to measure developer performance based on "functionality that passes test (QA) at first iteration".

We somehow managed to convince them that this was a bad idea for various reasons, and came up instead on measuring developers by putting code in test where all unit tests passes. Since in our team there is no "requirement" per se to develop unit tests before, we felt it was an opportunity to formalise the need to develop unit tests - i.e. put some incentive on developers to write unit tests.

My problem is this: since arguably we will not be releasing code to QA that do not pass all unit tests, how can one reasonably measure developer performance based on unit tests? Based on unit tests, what makes a good developer stand out?

  1. Functionality that fail although unit test passes?
  2. Not writing unit test for a given functionality at all, or not adequate unit tests written?
  3. Quality of unit test written?
  4. Number of Unit tests written?

Any suggestions would be much appreciated. Or am I completely off the mark in this kind of performance measurement?

+3  A: 

I think Joel had it spot-on when he said that this sort of measurement will be gamed by your developers. It will not achieve what it set out to and you will likely end up with quality suffering (from the perception of everyone using the system) whilst your measurements of quality all suggest things have never been better!

edit. You say that management are demanding this. You are a small company; your management cannot afford everyone to up sticks and leave. Tell them that this is rubbish and you'll play no part in it.

If the whole idea is so that they can rank people to make them redundant (it sounds like it might be at this time), just ask them how many people have to go and then choose those developers you believe to be the worst, using your intelligence and judgement and not some dumb rule-of-thumb

oxbow_lakes
I accept that, and i completely agree with you. Nonetheless, management wants some sort of measure, hence me asking today :(
Ash M
I think the best you have in that case is to measure the output of the team, rather than the individual.It should be obvious whether the project was a success or not. Use that to drive compensation.If there is someone who is a super star, they'll stand out. Laggards too. Deal with them 1:1.
Scott Wisniewski
I know a few people who are now being asked to rank teams because people are going to be let go. It's not all about laggards, unfortunately. Good people are going too
oxbow_lakes
Since good people will be let go as well as bad, I recommend that the good people leave as soon as they can - leaving the fools in Management to run a company with what's left. Besides, there is an assumption that those let go are bad - so leave first, on your own.
John Saunders
Guys, you misunderstand me. This exercise is not about getting rid of anyone. Its more about the company measuring employee performance. I'm just trying to work out what is the best approach in doing that. We thought of unit testing since it also contributes to better code.
Ash M
Then tell them that the only fair "measurement of performance" is the intelligent judgement of good managers. Anything else is incomplete, flawed and gameable.
oxbow_lakes
+2  A: 

For some reason the defect black market comes to mind... although this is somewhat in reverse.

Any system based on metrics when it comes to developers simply isn't going to work, because it isn't something you can measure using conventional methods. Whatever you try to put in place with regards to anything like this will be gamed (because solving problems is what we do all day, and this is just another problem to be solved) and it will be detrimental to your code (for example I wrote a simple spelling corrector the other day with about 5 unit tests which were sufficient to check it worked, but if I was measured on unit tests I could have spent another day writing another 100 which would all pass but would add no value).

You need to work out why management want this system in place. If it's to give rewards then you should have a look at Joel Spolsky's article about incentive pay which is not far off the mark from what I've seen (think about bonus day and see how many people are really happy -- none as they just got what they thought they deserved -- and how many people are really pissed off -- anyone who got less than they thought they deserved).

Greg Beech
+6  A: 

Perhaps I am completely off the mark in this kind of performance measurement?

The question is not "what do we measure?"

The question is "What is broken?"

Followed by "how do we measure the breakage?"

Followed by "how do we measure the improvement?"

Until you have something you're trying to fix, here's what happens.

  1. You pick something to measure.

  2. People respond by doing what "looks" best according to that metric.

  3. You realize you're measuring the wrong thing.

Specifically.

  • "functionalities that pass test (QA) at first iteration" Which means what? Save the code until it HAS to work. Later looks better. So, delay until you pass QA on the first iteration.

  • "Functionality that fail although unit test passes?" This appears to be "incomplete unit tests". So you overtest everything. Take plenty of time to write all possible tests. Slow down delivery so you're not penalized by this measurement.

  • "Not writing unit test for a given functionality at all, or not adequate unit tests written?" Not sure how you measure this, but it sounds the same as the previous one. .

  • "Quality of unit test written?" Subjective measurement. Always a good plan. Define how you're going to measure quality, and you'll get stuff that maximizes that specific measurement. Want more comments? Count those. What more whitespace? Count that.

  • "Number of Unit tests written?" Nothing motivates me to write redundant tests like counting the number of tests. I can easily copy and paste nearly identical code if it makes me look good according to this metric.

You get what you measure. No matter what metric you put in place, you will find that the specific thing measured will subvert most other quality concerns. Whatever you measure, but absolutely sure you want people to maximize that measurement while reducing others.


Edit

I'm not saying "Don't Measure". I'm saying "you get what you measure". Pick a metric that you want maximized at the expense of others. It's not hard to pick a metric. Just know the consequence of telling management what to measure.

S.Lott
Thank your for your answer. I agree with what you say, but somehow i need to offer management a way to measure us :(
Ash M
@Ash M: Pick a measurement -- it doesn't matter what you pick. Just know the consequence of picking one thing and having the others decline.
S.Lott
A: 

If you are going to tie people's pay to their unit test performance, the results are not going to be good.

People are going to try to game the system.

What I think you are after is:

  1. You want people to deploy code that works and has a minimum number of bugs
  2. You want the people that do that consistently to be rewarded

Your system will accomplish neither.

By tying people's pay to whether or not their tests fail, you are creating a disincentive to writing tests. Why would someone write code that, at beast, yields no benefit, and at worst limits their salary? The overall incentive will be to keep the size of the test bed minimal, so that the likely hood of failure is minimized.

This means that you will get more bugs, except they will be bugs you just don't know about.

It also means that you will be rewarding people that introduce bugs, rather than those that prevent them.

Basically you'll get the opposite of your objectives.

Scott Wisniewski
A: 

These are my initial thoughts on your four specific questions:

  1. Tricky this one. At first glance it looks OK, but if the code passes unit test then, unless the developers are cheating (see below) or the test itself is wrong, it's difficult to see how you'd demonstrate this.

  2. This seems like the best approach. All functions should have a unit test and inspection of the code should be able to reveal which ones are present and which are absent. However, one drawback could be that the developers write an empty test (i.e. one that just returns "passed" without actually testing anything). You might have to invest in lengthy code reviews to spot this one.

  3. How are you going to assess quality? Who is going to assess quality? This assumes that your QA team has access to highly skilled independent developers - which may be true, but seems unlikely.

  4. Counting the number of anything (lines of code, unit tests written) is a non starter. Developers will simply write large number of useless tests.

I agree with oxbow_lakes, and in fact the other answers that have appeared since I started writing this - most forms of measurement will be gamed or worse resented by developers.

ChrisF
A: 

I believe time is the only, albeit subjective, way to measure a developers performance.

Given enough time in any one company, good developers will stand out. Projectleaders will know who their best assets are. Bad developers will be exposed given enough time. Unfortunatly, therein lies the ultimate problem, enough time.

Lieven
+2  A: 

To quote Steve Yegge:

shouldn't there be a rule that companies aren't allowed to do things that have been formally ridiculed in a Dilbert comic?

Jim Arnold
+3  A: 

I would argue that unit tests are a quality tool and not a productivity tool. If you want to both encourage unit testing and to give management a productivity metric, make unit testing mandatory to get code into production, and report on productivity based on code/features that makes it into production over a given time frame (weekly, bi-weekly, whatever). If we take as a given that people will game any system, then design the game to meet your goals.

cmsjr
cmsr,Thank your for your comments and offering a solution. No disrespect to anyone else answering questions here - i completely agree with you all. I feel the same too. Sad thing is, i don't have the choice. management wants to measure us, and i prefer telling them how than them telling me. :(
Ash M
A: 

Basic psychology - People work to incentives. If my chances of getting a bonus / keeping my job / whatever are based on the number of tests I write, I'll write tons of meaningless tests - probably at the expense of actually doing my real job, which is getting a product out the door.

Any other basic metric you can come up with will suffer the same problem and be equally meaningless.

If you insist on "rating" devs, you could use something a bit more lateral. Scores on one of the MS certification tests perhaps (which has the side effect of getting people trained up). At least that's objective and independently verified by a neutral third party so you can't "game" it. Of course that score also bears no resemblance to the person's effectiveness in your team but it's better than an arbitrary internal measurement.

You might also consider running code through some sort of complexity measurement tool (simpler==better) and scoring people on their results. Again, it has the effect of helping people to become better coders, which is what you really want to achieve.

Marc
A: 

Poor Ash...

Kudos for using managerial inorance to push something completely unrelated, but now you have to come up with a feasible measure.

I cannot come up with any performance measurement that is not ridiculous or easily gamed. Unit tests cannot change it. Since Kopecks and Black Market were linked within minutes, I'd rather give you ammunition for not requiring individual performance measurements:

First, Software is an optimization between conflicting goals. Evaluating one or a few of them - like how many tests come up during QA - will lead to severe tradeoffs in other areas that hurt the final product.

Second, teamwork means more than just the product of a few individuals glued together. The synergistic effects cannot be tracked back to the effort or skill of a single individual - and when developing software in a team, they have huge impact.

Third, the total cost of software unfolds only after time. Maintenance, scalability, compatibility with new platforms, interaction with future products all carry a significant long term cost. Measuring short term cost (year-over-year, or release to production) does not cover the long term cost at all, and once the long term cost is known it is pointless to track it back to the originator.

Why not have each developer "vote" on their collegues: who helped us achieve our goals most in the last year? Why not trust you (as - apparently - their manager or lead) in judging their performance?

peterchen
+1  A: 

There was just some study I read in the newspaper here at home in Norway. In a nutshell it said that office types of jobs generally had no benefit from performance pay. The reason being that measuring performance in most office types of jobs was almost impossible.

However simpler jobs like e.g. strawberry picking benefited from performance pay because it is really easy to measure performance. Nobody is going to feel bad because a high performer get a higher pay because everybody can clearly see that he or she has picked more berries.

In an office it is not always clear that the other person did a better job. And so a lot of people will be demotivated. They tested with performance pay on teachers and found that it gave negative results. People who got higher pay often didn't see why they did better than others and the ones who got lower usually couldn't see why they got lower.

What they did find though was that non-monetary rewards usually helped. Getting encouraging words from the boss for well done jobb etc.

Read iCon on how Steve Jobs managed to get people to perform. Basically he made people believe that they were part of something big and were going to change the world. That is what makes people put in an effort and perform. I don't think developers will put in a lot of effort for just money. It has to be something they really believe in and/or think is fun or enjoyable.

Adam Smith
A: 

There should be a combination of a few factors to the unit tests that should be fairly easy for someone outside the development group to have a scorecard in terms of measuring the following:

1) How well do the unit tests cover the code and any common input data that may be entered for UI elements? This may seem like a basic thing but it is a good starting point and is something that can be quantified easily with tools like nCover I think.

2) Are there boundary conditions often tested,e.g. nulls for parameters or letters instead of numbers and other basic validation tests? This is also something that can be quantified easily by looking at parameters for various methods as well as having coding standards to prevent bypassing things here, e.g. all the object's methods besides the constructor take 0 parameters and thus have no boundary tests.

3) Granularity of a unit test. Does the test check for one specific case and not try to do lots of different cases in one test? Are test classes containing thousands of lines of code?

4) Grade the code and tests in terms of readability and maintainability. Would someone new have to spend days figuring out what is going on or is the code somewhat self-documenting? Examples would include method names and class names being meaningful and documentation being there?

That last 3 things are what I suspect a manager, team lead or someone else that is outside the group of developers could rank and handle. There may be some games to this to exploit things but the question is what end results do you want to have? I'm thinking well-documented, high quality, easily understood code = good code.

JB King
A: 

Look up Deming and Total Quality Management for his thoughts on why performance appraisals should not be done at all for any job.

How about this instead, assume all employees are acceptable employees unless proved different.

If someone does something unacceptable or does not perform to level you need, write them up as a performance problem. Determine how many writeups they get before you boot them out of the company.

If someone does something well, write them up for doing something good. If you want to offer a bonus, give it at the time the good performance happens. Even better make sure you announce when people get an attaboy. People will work towards getting them. Sure you will have the policial types who will try to game the system and get written up onthe basis of others achievements but you get that anyway in any system. By announcing who got them at the time of the good performance, you have removed the secrecy that allows the office politics players to function best. If everyone knows Joe did something great and you reward Mary instead, people will start to speak up about it. At the very least Joe and Mary might both get an attaboy.

Each year, give everyone the same percentage pay raise as you have only retained the workers who have acceptable performance and you have rewarded the outstanding employees through the year anytime they did something good.

If you are stuck with measuring, then measure how many times you wrote someone up for poor performance and how many times you wrote someone up for good performance. Then you have to be careful to be reasonably objective about it and write up even the people who aren't your ffriends when they do good and the people who are your friends when they do bad. But face it the manager is going to be subjective in the process no matter how you insist on objective criteria becasue there is no object criteria in the real world.

HLGEM