views:

177

answers:

4

We have several C++ projects that were built from the same codebase. There's a lot of similarities and common code between them but they were developed independently; source was not shared in any way. Classes and files will have been renamed even if the underlying code hasn't changed and individual lines will have been tweaked, changed and replaced.

I'd like to be able to compare the different codebases and find out how much of the code is still the same. It can be fairly high level - % of code that is the same is fine. I also need to be able to automate this process.

Is there a tool that I can run on the codebases and get some sort of report/assessment of how much is common?

+3  A: 

I don't have much experience with this sort of thing, but it made me think back to my school days when our University would run everyones code through a program to find cheaters. This brought me to the following link:

Source Code Similarity Detection

It names some open source and commercial software that should meet your needs.

RC
Plagiarism tests for student codes operate on tiny files. They are also pretty unsophisticated; they look only for exact matches. If you want to detect similar code across very large systems, you need scalable clone detection tools, and it is extremely helpful if they can match near-misses rather than exact copies, because the paradigm isn't "copy and paste", its "copy/paste/*edit*".
Ira Baxter
I can agree with the tiny files, but at least at my University, they had plagiarism tools that detected more than just exact matches. Most college level students are smart enough to know they need to edit what they copied to some extent in order to hide the fact they are cheating. There were quite a few that tried this and ended up getting caught and going to honor court b/c of it.
RC
Also note that all the solutions on the link I provided indicate detection far beyond simple copy and paste and the ability to work on large file sets. They do this based on fingerprinting and analyzing code structure. Isn't this the reason stated for the down vote?
RC
A: 

It probably does not solve your problem entirely, but if you want to compare/diff/merge sources, i strongly recommend BeyondCompare from

http://www.scootersoftware.com/

Its the best by far. As far as i know its used by the makers of SO as well.

RED SOFT ADAIR
+2  A: 

There is the java tool dude, part of the MOOSE software reengineering toolkit, by Richard Wettel. It is documented in his (masters?) thesis. MOOSE provides much more than just this, you might want to look at his Codecity.

I've used it on java, c#, delphi, xml. It should work ok on c++ too. For large code bases, don't forget to give it enough heap space, and start with a simple similarity metric.

Stephan Eggermont
A: 

See CloneDR which detects exact and near-miss code duplication. You could apply this across your two systems to see what they share. CloneDR works for a variety of programming langauges, including C++.

Ira Baxter