The main problem with performance metrics like this, is that humans are VERY good at gaming any system that measures their own performance to maximize that exact performance metric - usually at the expense of something else that is valuable.
Lets say we do use the hudson build to gather stats on programmer output. What could you look for, and what would be the unintended side effects of measuring that once programmers are clued onto it?
- Lines of code (developers just churn out mountains of boilerplate code, and other needless overengineering, or simply just inline every damn method)
- Unit test failures (don't write any unit tests, then they won't fail)
- Unit test coverage (write weak tests that exercise the code, but don't really test it properly)
- Number of bugs found in their code (don't do any coding, then you won't get bugs)
- Number of bugs fixed (choose the easy/trivial bugs to work on)
- Actual time to finish a task based against their own estimate (estimate higher to give more room)
And it goes on.
The point is, no matter what you measure, humans (not just programmers) get very good at optimizing to meet exactly that thing.
So how should you look at the performance of your developers? Well, that's hard. And it involves human managers, who are good at understanding people (and the BS they pull), and can look at each person subjectively in the context of who/where/what they are to figure out if they are doing a good job or not.
What you do once you've figured out who is/isn't performing is a whole different question though.
(I can't take credit for this line of thinking. It's originally from Joel Spolsky. Here and here)