views:

216

answers:

8

Sadly, a project that I have been working on lately has a large amount of copy-and-paste code, even within single files. Are there any tools or techniques that can detect duplication or near-duplication within a single file? I have Beyond Compare 3 and it works well for comparing separate files, but I am at a loss for comparing single files.

Thanks in advance.

Edit:

Thanks for all the great tools! I'll definitely check them out.

This project is an ASP.NET/C# project, but I work with a variety of languages including Java; I'm interested in what tools are best (for any language) to remove duplication.

+2  A: 

If you're using Eclipse, you can use the copy paste detector (CPD) https://olex.openlogic.com/packages/cpd.

Jeff Storey
A: 

Resharper does this automagically - it suggests when it thinks code should be extracted into a method, and will do the extraction for you

BlueRaja - Danny Pflughoeft
+2  A: 

Check out Atomiq. It finds code that is duplicate that is prime for extracting to one location.

http://nimblepros.com/products/atomiq.aspx

Chris Missal
... for .Net anyway. :)
Chris Missal
CopyPasteKiller has been rebranded as Atomiq and is now $30 (which seems reasonable). http://nimblepros.com/products/atomiq.aspx
Peter Bernier
A: 

Check out PMD , once you have configured it (which is tad simple) you can run its copy paste detector to find duplicate code.

Ravi Gupta
+1  A: 

See SD CloneDR, a tool for detecting copy-paste-edit code within and across multiple files. It detects exact copyies, copies that have been reformatted, and near-miss copies with different identifiers, literals, and even different seqeunces of statements.

The CloneDR handles many languages, including Java (1.4,1.5,1.6) and C# especially up to C#4.0. You can see sample clone detection reports at the website, also including one for C#.

Ira Baxter
A: 

You don't say what language you are using, which is going to affect what tools you can use.

For Python there is CloneDigger. It also supports Java but I have not tried that. It can find code duplication both with a single file and between files, and gives you the result as a diff-like report in HTML.

Dave Kirby
A: 

One with some Office skills can do following sequence in 1 minute:

  • use ordinary formatter to unify the code style, preferably without line wrapping
  • feed the code text into Microsoft Excel as a single column
  • search and replace all dual spaces with single one and do other replacements
  • sort column

At this point the keywords for duplicates will be already well detected. But to go further

  • add comparator formula to 2nd column and counter to 3rd
  • copy and paste values again, sort and see the most repetitive lines
RocketSurgeon
A: 

There is an analysis tool, called Simian, which I haven't yet tried. Supposedly it can be run on any kind of text and point out duplicated items. It can be used via a command line interface.

Grant Palin