views:

556

answers:

6

I've searched the Internet for a while now and I have not been able to find any free (or cheap) tools/utilities/modules that can analyze a set of Perl files (modules or scripts) and flag duplicate or cloned or copy/pasted code.

I'm better now, but I used to copy and paste sections of code all over the place. I'd like to clean it up and fix my old code duplication, but a little bit of tool help would be appreciated so I won't have to go through all my old code with a fine tooth comb. Plus, manual recognition of this sort of offense is error prone.

+4  A: 

What do you mean by duplicate code? Just character exact matches or semantic matches.

There are several tools like http://pmd.sourceforge.net/ that can detect duplicate code by string matches, this tool is for java but the source matching works on plain text.

If you want semantic matching, like

sub A
{return 1;}

to match

sub B
{
    return 1;
}

Then you'll need something else:(

chollida
Thanks. I just tried PMD plugin for Eclipse and it does not appear to be able to scan perl (or plain text) files. Choices are Java, JSP, CPP, C, PHP, Ruby, Fortran. For giggles, I tried a couple and it gives me an empty copy/paste report.
Kurt W. Leucht
By default it looks for blocks that are about 30 lines long. We use it for our in house language, loosely based off of Javascript and it works fine for us.
chollida
You can run all the code through perltidy to smooth out the stylistic differences (but not the subroutine names).
Schwern
SD's CloneDR can do this particular "semantic" match just fine, because you only need do a syntactic match, with substitions, over the language structure. CloneDR uses a parser for the actual langaue to eliminate differences in whitespaces, comments, and to detect when a construct can be parameterized to produce such a match. See www.semanticdesigns.com/Products/Clone
Ira Baxter
+5  A: 

Funny a similar question was posted to SO just a few minutes ago.

Here is a link with some tools you may find useful.

Code Comparison and Plagirism Detection

RC
Could you please link to that other question?
innaM
http://stackoverflow.com/questions/1461805/how-can-i-compare-similar-codebases -- a similar question about C++
mobrule
I'm evaluating the CodeMatch product. Had to get on a corporate newsletter email list to download the software, though. Luckily, I used a disposable email address.
Kurt W. Leucht
mobrule posted the SO link I was referring too. Thanks.
RC
Not impressed with CodeMatch. It compares two sets of files for similarity to each other, but does not appear to search and find duplicate code within a single file. I'm uninstalling it as I type this.
Kurt W. Leucht
A: 

Semantic Designs makes a product called Clone Dr. that appears to be able to analyze a large number of language types for cloned sections of code. But it appears that their free evaluation version only works on Java and Cobol.

Kurt W. Leucht
I'm the CloneDR product manager. It provides (we think) really good results by virtue of comparing ASTs for programs, which gets rid of any formatting issues completely. It does handle a lot of languages, but Perl isn't presently one of them. After all, "only Perl can parse Perl" :-} [Actually, we have very good parsing engines; we'll get to Perl someday.]
Ira Baxter
Good to know. There may not be a ton of customers out there for Perl, though. I tried your evaluation version of Clone Dr. on an old JAVA project of mine a while back and I was impressed with the results. It was this experience that made me realize that I needed to analyze all the rest of my code (some of which includes some large Perl scripts) for copy/paste offenses.
Kurt W. Leucht
You can get evaluation versions for Java, C#, C, C++, COBOL and PHP. You might have to ask at the web site.
Ira Baxter
A: 

I just evaluated Simian. It has a 15 day free evaluation period and costs a hundred bucks for a single user license. It doesn't officially support Perl, but it does treat them as plain text and analyzes them anyways. This is a super fast utility! And super easy to use. The report generated from this tool was simple and easy to interpret. I totally approve of this tool. Now I just need to talk to my boss and get him to purchase a license.

Kurt W. Leucht
P.S. I emailed the Simian developers and asked them if they intended to support Perl, and they immediately wrote back that supporting Perl had never occurred to them, but that they would put it on their to-do list. I'm not even a paying customer. Now that's great support. (unless they were just blowing me off)
Kurt W. Leucht
+2  A: 

I have used CCFinder in the past to find sections of code which are duplicates. It works quite well but has an.. interesting interface. It doesn't have native support for perl, but it does have a plaintext option which should work for detection of copy and pasting at least. There is a Windows and Ubuntu solution - Freeware, not open source unfortunately.

jamuraa
Oh wow ... this is a great utility! And the way it visually shows you your duplicate code on a scatter plot is amazing! I think this is the coolest piece of free software that I've ever experienced. The user interface is a bit klunky at first, but once you get used to the interface, it is a wonderfully powerful code duplication analyzer. Two nits, though. It's not cross-platform. And it leaves a bunch of temporary files behind in your source code tree.
Kurt W. Leucht
I was able to easily modify one of the python files to recognize and ignore POD and Perl comments. Now I'm even more excited about CCFinder! (Had to remove all the temporary files by hand and restart to make it work, though.)
Kurt W. Leucht
A: 

Here's another web page listing some clone detection tools:

http://sel.ics.es.osaka-u.ac.jp/cdtools/index-e.html

Kurt W. Leucht