I have being reading and tracking some questions on code reuse and I have this question:
Are there any tools to identify duplicate or similar code?
I have googled this a while ago and found nothing good.
I have being reading and tracking some questions on code reuse and I have this question:
Are there any tools to identify duplicate or similar code?
I have googled this a while ago and found nothing good.
Simian - Similarity Analyser
Purpose
Simian (Similarity Analyser) identifies duplication in Java, C#, C, C++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files. In fact, simian can be used on any human readable files such as ini files, deployment descriptors, you name it.
Especially on large enterprise projects, it can be difficult for any one developer to keep track of all the features (classes, methods, etc.) of the system.
Code Coverage, Inspections and Duplicates Search is a feature of TeamCity's Code Quality features.
I use TeamCity personally and I really like it. It does support .NET and Java.
There is tool for Python and Java: http://clonedigger.sourceforge.net/
For .NET, you can get CloneDetective, it's a free plugin for VS. C# only, but the underlying technology supports various languages.
See our clone detector that works for C, C++, C#, Java, COBOL, VB6, PHP and many other languages can be seen at: http://www.semdesigns.com/Products/Clone/index.html It finds exact and near-miss clones, so it will detect clones that have been parameterized by editing.
It works by matching language structures, not text lines or tokens, so the reported clones look like code structures. Line-based clone detection can't match clones that that have been reformatted, have white space changes, or in which the comments have changed. Token based detectors often find clones which make no sense, such as
} {
which occur huge numbers of times in the text, but are clones only in the dumbest sense of the word.
See an example of detected clones. There are several other clone detector reports for various langauges there.
EDIT 3/25/2010: ... now does Python ...
EDIT 8/5/2010: ... now does EGL ...
EDIT 10/22/2010: ... now does VBScript and VB.net ...
Same (http://sourceforge.net/projects/same/) is extremely plain, but it works on text lines instead of tokens, which is useful if you're using a language that isn't supported by one of the fancier clone finders.
I have written a duplication detector. It is written in Python and based on "pygments lexer". Hence works on all languages supported by pygments. Check Thinking Craftsman Toolkit. Setup/install is not available yet you have to get the source from svn. See if it works for you.
Check out CCFinder. It has an interesting graphical user interface. It shows you your duplicate code in an interactive scatter plot.
If you need a good tool you have to look for something that detects similar code and not perfect (i.e., identical) matches. Such a tool should:
The tool I recommend for the job is the Source Code Duplication Detector (SolidSDD). Via the included visualization and reporting features it makes the detection results relevant not only for developers, but also for architects and managers.
Perhaps you could use MOSS to determine similar parts of your program.
While not its primary usage, PyLint can report possible duplicated code:
Similarities checker
checks for similarities and duplicated code. This computation may be memory / CPU intensive, so you should disable it if you experiments some problems.
Options
min-similarity-lines: Minimum lines number of a similarity. Default: 4 ignore-comments: Ignore comments when computing similarities. Default: yes ignore-docstrings: Ignore docstrings when computing similarities. Default: yes