views:

273

answers:

5

I am in a project where previous programmers have been copy-pasting codes all over the place. These codes are actually identical (or very similar) and they could have been refactored into one.

I have spent countless hours refactoring these codes manually but I think there must be a better way. Some are very trivial static methods that could have been moved into an ancestor class (but instead was copy pasted all over by previous junior programmers).

Is there a code analysis tool that can detect this and provide reports/recommendations? I prefer free/open source tool if possible.

+6  A: 

I use the following tools:

Both tools have code duplication detection support. But both of them lack ability to advise you how to refactor your code.

Good static code analysis with code duplication support has Jetbrains Intellij IDEA, but it is not free.

uthark
+1  A: 

Most of the tools listed on the Wikipedia article on Duplicate Code Tools will detect duplicates in many different languages, including Java.

wsanville
Thanks for the link.
Rosdi
+1  A: 

Either Simian or PMD's CPD. The former supports a wider set of languages but is non free for commercial projects.

Pascal Thivent
One feature of simian that's quite good is it's ability to find code that was not copied, but developed independently. So it may do the same thing, but have completely different variable names and even sub types. In simainls setup you can specify to ignore variable names and regard sub types as the same parent type etc.
Derek Clarkson
It is extremely rare for clone detectors to find code that "was not copied but developed independently" unless the code fragments are microscopic (a*b is a clone of x*y and is developed independently but nobody cares). Having built a strong clone detector, my experience is what they find is code that has been cloned; better ones can find cloned code with changed variable names and different constants. Simian is one of these. Strong ones (mine is one of these) can detect when arbitrary subexpressions and statements have been replaced.
Ira Baxter
A: 

http://checkstyle.sourceforge.net/ has support for finding duplicates

Nikolaus Gradwohl
A: 

See our SD Java CloneDR, a tool for detecting exact and near-miss duplicate code in large Java systems.

The CloneDR will find code clones in spite of whitespace changes, line breaks, comment insertions deletions, modification of constants or identifiers, and in a number of cases, even replacement of one statement by another or a block of statements.

It shows where each set of clones is found, each individual clone, an abstraction of the clones having their shared commonality and parameterization of the abstraction to show how each clone instance can be derived from the abstraction.

It finds 10-20% clones in most Java systems.

Ira Baxter
It is not free, but I will give it a spin nonetheless.
Rosdi