What defect rate can I expect in a C++ codebase that is written for an embedded processor (DSP), given that there have been no unit tests, no code reviews, no static code analysis, and that compiling the project generates about 1500 warnings. Is 5 defects/100 lines of code a reasonable estimate?
That also depends on who wrote the code (level of experience), and how big the code base is.
I would treat all warnings as errors.
How many errors do you get when you run a static analysis tool on the code?
EDIT
Run cccc, and check the mccabe's cyclic complexity. It should tell how complex the code it.
http://sourceforge.net/projects/cccc/
Run other static analysis tools.
Take a look at the code quality. It would quickly give you a indication of the amount of problems hiding in the source. If the source is ugly and take a long time to understand there will be a lot of bugs in the code.
Well structured code with consistent style and that is easy to understand are going to contain less problems. Code shows how much effort and thought went into it.
My guess is if the source contains that many warnings there is going to be a lot of bugs hiding out in the code.
Your question is "Is 5 defects/100 lines of code a reasonable estimate?" That question is extremely difficult to answer, and it's highly dependent on the codebase & code complexity.
You also mentioned in a comment "to show the management that there are probably lots of bugs in the codebase" -- that's great, kudos, right on.
In order to open management's figurative eyes, I'd suggest at least a 3-pronged approach:
- take specific compiler warnings, and show how some of them can cause undefined / disastrous behavior. Not all warnings will be as weighty. For example, if you have someone using an uninitialized pointer, that's pure gold. If you have someone stuffing an unsigned 16-bit value into an unsigned 8-bit value, and it can be shown that the 16-bit value will always be <= 255, that one isn't gonna help make your case as strongly.
- run a static analysis tool. PC-Lint (or Flexelint) is cheap & provides good "bang for the buck". It will almost certainly catch stuff the compiler won't, and it can also run across translation units (lint everything together, even with 2 or more passes) and find more subtle bugs. Again, use some of these as indications.
- run a tool that will give other metrics on code complexity, another source of bugs. I'd recommend M Squared's Resource Standard Metrics (RSM) which will give you more information and metrics (including code complexity) than you could hope for. When you tell management that a complexity score over 50 is "basically untestable" and you have a score of 200 in one routine, that should open some eyes.
One other point: I require clean compiles in my groups, and clean Lint output too. Usually this can accomplished solely by writing good code, but occasionally the compiler / lint warnings need to be tweaked to quiet the tool for things that aren't problems (use judiciously).
But the important point I want to make is this: be very careful when going in & fixing compiler & lint warnings. It's an admirable goal, but you can also inadvertantly break working code, and/or uncover undefined behavior that accidentally worked in the "broken" code. Yes, this really does happen. So tread carefully.
Lastly, if you have a solid set of tests already in place, that will help you determine if you accidentally break something while refactoring.
Good luck!
If you want to get an estimate of the number of defects, the usual way of statistical estimatation is to subsample the data. I would pick three medium-sized subroutines at random, and check them carefully for bugs (eliminate compiler warnings, run static analysis tool, etc). If you find three bugs in 100 total lines of code selected at random, it seems reasonable that a similar density of bugs are in the rest of the code.
The problem mentioned here of introducing new bugs is an important issue, but you don't need to check the modified code back into the production branch to run this test. I would suggest a thorough set of unit tests before modifying any subroutines, and cleaning up all the code followed by very thorough system testing before releasing new code to production.
Despite my scepticism of the validity of any estimate in this case, I have found some statistics that may be relevant.
In this article, the author cites figures from a "a large body of empirical studies", published in Software Assessments, Benchmarks, and Best Practices (Jones, 2000). At SIE CMM Level 1, which sounds like the level of this code, one can expect a defect rate of 0.75 per function point. I'll leave it to you to determine how function points and LOC may relate in your code - you'll probably need a metrics tool to perform that analysis.
Steve McConnell in Code Complete cites a study of 11 projects developed by the same team, 5 without code reviews, 6 with code reviews. The defect rate for the non-reviewed code was 4.5 per 100 LOC, and for the reviewed it was 0.82. So on that basis, your estimate seems fair in the absence of any other information. However I have to assume a level of professionalism amongst this team (just from the fact that they felt the need to perform teh study), and that they would have at least attended to the warnings; your defect rate could be much higher.
The point about warnings is that some are benign, and some are errors (i.e. will result in undesired behaviour of the software), if you ignore them on the assumption that they are all benign, you will introduce errors. Moreover some will become errors under maintenance when other conditions change, but if you have already chosen to accept a warning, you have no defence against introduction of such errors.