views:

457

answers:

8
+4  Q: 

Clone detectors

Anyone know some good clone detectors?

I'm searching for examples and algorithms on how they work to study them.

A clone detector is a program that detect repeated code in the src file.

An example of a clone detector I already found, was this for eclipse: http://www.cs.mcgill.ca/~swevo/clonetracker/

But I'm really interested is in documentation about how the algorithms they use works.

Can anyone show me some links?

For the reference i just found this one, seems to be cool: http://pmd.sourceforge.net/cpd.html

+4  A: 

Check out these three papers on clone detection in software. They're quite useful in understanding the process.

A very simple way to detect clones in a program would be to find patterns in the parse tree that is generated from the source.

trex279
CloneDR (see my answer here) works by matching "parse trees". There's the "very simple" part, but there are a number of sophisticated adjustments required to do it in practice and do it on scale. The technical paper (see my answer here) addresses these topics. And of course there's the "small matter" of actually having a parser for most langauges.
Ira Baxter
+2  A: 

See CloneDigger. It detects clones in Python, Java and Lua source files. It integrates with Eclipse as well.

The project also contains a paper that explains the principles.

Anonymous
A: 

Simian is the best I know. I'm not sure about its algorithm. You can look for it at their website or ask on the user group though.

Krzysztof Koźmic
A: 

PMD is a good open-source one.

Brian Carlton
+1  A: 

Clone Detective for C# and Visual Studio. It uses ConQAT to do clone detection, and is Java based. It has a graphical interface and can also do clone detection in Java. On the latter link there are links to talks and publications on this tool. I find it's the best integrated tool in Visual Studio at the moment.

I find that the easiest way to dig up more papers/algorithms for topics is just to start with a high-end publication (e.g. see the ICSE papers referred to on the ConQAT page), and then just find the papers they describe in the Related Work section. Repeat until you recognize the papers by title. Congratulations, you're now an expert on the matter ;)

Kurt Schelfthout
A: 

I had a similar question regarding the duplication detection algorithms. Hence I wrote a 'clone detector' using the Rabin Karp algorithm. It is written in Python. I have put the source on google code (Code Duplication Detector). Have a look.

Nitin Bhide
+1  A: 

See CloneDR for a clone detector that does exact and near-miss clone detection for many programming languages, including C, C++, Java, C#, COBOL, ECMAScript, PHP. You can see examples of clone detection runs for several different languages there. There is also an Eclipse plug-in for Eclipse RDZ.

The "papers" section at that web site contains a detailed technical paper on how the clone detector works: "Clone Detection Using Abstract Syntax Trees" by myself and several other supporting authors.

Ira Baxter
A: 

I'd suggest taking a look at CCFinder. It is command-line tool with nice Java frontend. It's opensource. Just works.

Andrey