I want to automate moving duplicate or similar C code into functions.
This must work under Linux.
I want to automate moving duplicate or similar C code into functions.
This must work under Linux.
A subset of your problem: Detecting duplicate code:
Try: PMD
Duplicate code can be hard to find, especially in a large project. But PMD's Copy/Paste Detector (CPD) can find it for you! CPD has been through three major incarnations:
- First we wrote it using a variant of Michael Wise's Greedy String Tiling algorithm (our variant is described here)
- Then it was completely rewritten by Brian Ewins using the Burrows-Wheeler transform
- Finally, it was rewritten by Steve Hawkins to use the Karp-Rabin string matching algorithm.
...
Note that CPD works with Java, JSP, C, C++, Fortran and PHP code.
You'll want to take a look at Simian. It's free for noncommercial projects. Try something like:
# Find all C source files and identify similarities/duplicate code.
simian -includes=**/*.c -excludes=**/*_test.c
Simian (noted earlier) is a good tool for this. I have been using CloneDetective on my project and it works great. CloneDetective is free, so it can't hurt to give it a try.
Be aware that you can't just compare lines of text. You will have to parse the code, in this manner, you could also detect segments that are semantically correct but may have different named identifiers.
For example, given two functions that are equivalent but use different identifiers, a text search will not see them as identical, but a parser can.
Also note that writing a C++ parser is not a trivial task, even when given the grammar. I suggest the advice of others and seek out a tool for this. Also search for refactoring tools.
See CloneDR, a tool for finding exact copy and near-miss (copy-paste-edit) clones in source code. It uses full language parsers to enable it to find clones according to the language structure, minimizing false positives, and to be completely indendent of how the code is commented or formatted, thereby maximing true detection. The CloneDR will find clones when the cloned block has changed variable, inserted statemens or blocks of code.
It has language front ends for C, C++, COBOL, C#, Java, PHP and a number of other langauges.
You can see sample clone detection reports at the website.