views:

172

answers:

4

I'm in a situation where I'm required to make at least some effort to remove never-used code from my source code. The general preference is to use a static code analysis tool. We've had great luck with this in other projects, but the folks I hear from are mostly C/C++ developers working on device level code.

I'm a web developer working on a JEE system. The favored tool for analysis is Coverity, although I could probably advocate for something else, if I could make a strong case that it was more appropriate to the technology we're developiong with.

I find myself dubious -- what the effectivity of static code analysis for dead code, when you are running against a system with a lot of abstractions? For example, we use Spring's dependancy injection, as well as JSF. In both cases, there's no easy way to trace through the function calls from front end to back end and make a complete picture of what gets called and what doesn't.

I'm very concerned that the false positives on a dead code check will outweigh the value of running the tool in the first place.

Does anyone have any experience with this scenario? Did you manage to get value from a static code analysis tool when your architecture used a lot of abstractions. Was there anything you had to do to get it to work with a minimum of false positives?

+4  A: 

I previously worked at Coverity, on the Java static analysis product.

This particular task of finding dead code can be hit-or-miss for a static analyzer. For dead methods in particular, meaning a method which cannot be called at runtime, the false positive rate will be very high without a lot of tuning from you to inform the static analyzer about all the dynamic entry points.

For deadcode within a method, if your analyzer has that capability, the results should be pretty good, as the analysis will not make any assumptions about the input data. Even assuming all possibe inputs, it is possible to find dead code where correlated logic prevents certain branches from being taken.

Michael Donohue
A: 

Static code analysis in every case should only be done if you completely understand the analysis-process and the code. The problem simply is, that it's just rough and offering assumptions. Neither a solution, nor a completely accurate check for vulnerabilities. You have to able to determine the false positives with other testing methods.

The value of a static code analyser is not a code-review. To eliminate dead code I'd use code-coverage to profile that. - In your case - you mentioned many abstractions... I pretty much guess static code analysis won't be enough.

wishi
This seems like awfully strict advice. So no one should use Findbugs unless they understand the limits of a dataflow analysis? Coverity's product is off limits until you've read about the details of abstract interpretation?If you like the results, go with it. There's no need to understand exactly how the analysis is done.
Michael Donohue
But that's exactly what I meant: no one should use FindBugs, PMD or anything related to static code-analysis unless he/she is familiar with data-flow analysis. Otherwise people tend to try to avoid detection, but produce the same bugs anyway. Only very sophisticated programmers can use static-code analysis efficiently to eliminate vulnerabilities in code. The product that finds the problem doesn't write a patch, nor is it aware of the model.
wishi
+2  A: 

You can use test coverage tools (dynamic analysis) to determine what code of your system is used; the complement is code that might be dead (it wasn't executed!) and needs inspection (e.g, there can be some false positives). The more exercise you give your system, the lower the false positive rate.

A Java test coverage tool that can collect this data for you can be found here.

If you wanted to minimize the false positives, you might consider running the static analysis tool and test coverage, and taking the intersection.

In general, detecting dead code X requires proving that there is no condition under which X is invoked. That's hard (theoretically impossible) when faced with a Turing machine and IF statements of the form

 if (Turing(..)) then call X();

which is why static analysis tools have high false positive rates for this.

In many cases, however, "dead code" is really just code that simply has no way to invoke it ("deactive code" in FAA parlance.). That is, while X is defined, there are simply no invocations of X (or accesses, if X is a data item) anywhere in the system, directly or indirectly. These are easier for static analysis tools to detect with the messy complication in Java of dynamic class loading and reflection (which make the deactive code analysis problem impossible in the face of unknown but loadable classes).

Ignoring these complications, one can find static analysis tools that detect deactive code in large Java systems and report it. Such a tool has to process the entire Java system at once because otherwise a reference might exist in the one module not included in the analysis. We have built a "deactive" code detector and remover can even provide you your source code back with all the deactive code automatically removed, as well as report on what's unreferenced. You inspect the report, and decide if you want to use the cleaned up code or add an access to an apparantly unused entity.

Ira Baxter
Impossibility theories about programs are over blown when it comes to static analysis tools. The halting problem is a big issue, as referenced by your Turing() method, but it applies to optimizing compilers, and no one doubts the utility of an optimizing compiler. Analysis tools need to provide evidence to the user, and users won't believe a complex chain of evidence. This leaves the relatively mundane, but that is interesting since a tool can find issues the are widely disparate in the code base, that would be very difficult for a human to pinpoint, but are quite easy for a human to verify.
Michael Donohue
I'n not faulting static analysis tools (I buld them too!), just observing that theory lilmitations gaurantee they can't produce the right answer in all circumstnces. I agree the right response is to use the static analysis tools because they are much better in generatl at dealing with complexity than people, but to check the answers. "Trust, but verify".
Ira Baxter
A: 

Working on static analysis, I am not aware of any static analysis tool that actually works with abstractions. You most likely will have to write a module that hooks into analyzing process and reasons about the way you use abstractions.

And I doubt that dead code presence costs more than this.

Pavel Shved