What is the Industry Standard for bugs per 1000 Lines of Code ?
What is the number that your company is using ?
What are the other metrics to identify code quality ?
What is the Industry Standard for bugs per 1000 Lines of Code ?
What is the number that your company is using ?
What are the other metrics to identify code quality ?
From an article at lwn.net:
Bugs per thousand lines of code (kLOC) can only be evaluated as a relative number, since we cannot know:
* if blank lines of code are counted, * if comments are counted either, * if coding style matters (lone '{'s or '}'s)...
Otherwise, we can only suppose it is non-blank, non-comment lines of code what we are counting (the usual industry standard); and play with broad estimates, which I will presently do for the fun of it.
The figure given by Carnegie Mellon University, 20 or 30 bugs per kLOC, is definitely not for released software, but probably for written software before any testing happens. After release, the number would rather be 1 to 5 bugs per kLOC in commercial software. For mission-critical code, the count can be as low as 0.1 bugs per kLOC (as in Shuttle software), depending on cricicity and budget. Project size is also a factor.
Of course the rate in Linux is lower than in "commercial enterprise software"; an operating system kernel arguably is mission-critical software. 0.17 bugs per kLOC looks like a lot, even if those bugs are in device drivers, or especially then since they can take down the whole system, corrupt data, etc. (I remember estimates for w2k were 2 bugs per kLOC after release, but that includes the whole operating system, not just the kernel.)
But there is more. Nobody would expect that, after fixing the 985 bugs, Linux would magically become error-free. So 0.17 bugs per kLOC must be a lowest-bound estimate; the real figure will be higher.
All in all, a poor press release with not much real value, but great promotion for the Stanford Code Checker.
I always go with a standard figure of 1000 bugs per 1000 lines of code. This makes it easy to figure out which lines are buggy.
Here's an interesting blog post that cites some figures from books: http://amartester.blogspot.com/2007/04/bugs-per-lines-of-code.html
The book "Code Complete" by Steve McConnell has a brief section about error expectations. He basically says that the range of possibilities can be as follows:
(a) Industry Average: "about 15 - 50 errors per 1000 lines of delivered code." He further says this is usually representative of code that has some level of structured programming behind it, but probably includes a mix of coding techniques.
(b) Microsoft Applications: "about 10 - 20 defects per 1000 lines of code during in-house testing, and 0.5 defect per KLOC (KLOC IS CALLED AS 1000 lines of code) in released product (Moore 1992)." He attributes this to a combination of code-reading techniques and independent testing (discussed further in another chapter of his book).
(c) "Harlan Mills pioneered 'cleanroom development', a technique that has been able to achieve rates as low as 3 defects per 1000 lines of code during in-house testing and 0.1 defect per 1000 lines of code in released product (Cobb and Mills 1990). A few projects - for example, the space-shuttle software - have achieved a level of 0 defects in 500,000 lines of code using a system of format development methods, peer reviews, and statistical testing."
I learned a lot of "strange" metrics over the years but not this one. My first impression is that you have to have that number per programming language. Languages are differ enormously in expressiveness. And expressiveness is also density of problem solutions and therefor a bigger suspect to bugs. So a language that is very expressive will suffer a lot in this statistic. Seems to be a pro-java metric :)
Just try to have zero-known bugs. Coming up with a figure that guesses at possible bugs that you have not found yet is pointless, especially if you are then going to congratulate yourself on being within a certain threshold of "total number of bugs you have not discovered yet".
The standard varies across programming languages. Try to compare lines of code of Java and Python 'Hello world!' program to see what I mean :)
There are some metrics like "effective thousands of lines of code", EKLOC, as far as I remember... But when You start counting and coders find out about it they stop writing comments, as it's not counted. They split statements between the lines, so You modify the counting script... One guy has spent half a year doing this over and over and then wrote a book ("measuring performance in knowledge organizations" - or something like that).
The rule of a thumb is to have someone from QA have a look on what You've done. He/She will discover most of the bugs in very short time and unless You work for NASA, this is as good as You are going to get. After You fix it, give it again to the QA so they retest if You didn't break something.
For my QA team the basic metric is: