views:

368

answers:

6

I want to automate moving duplicate or similar C code into functions.

This must work under Linux.

A: 

Perl!

KernelJ
+3  A: 

A subset of your problem: Detecting duplicate code:

Try: PMD

Duplicate code can be hard to find, especially in a large project. But PMD's Copy/Paste Detector (CPD) can find it for you! CPD has been through three major incarnations:

  • First we wrote it using a variant of Michael Wise's Greedy String Tiling algorithm (our variant is described here)
  • Then it was completely rewritten by Brian Ewins using the Burrows-Wheeler transform
  • Finally, it was rewritten by Steve Hawkins to use the Karp-Rabin string matching algorithm.

...

Note that CPD works with Java, JSP, C, C++, Fortran and PHP code.

The MYYN
I have used CPD for similar tasks in Java code. The output can be xml, so it is 'easy' to automate.
vkraemer
+1  A: 

You'll want to take a look at Simian. It's free for noncommercial projects. Try something like:

# Find all C source files and identify similarities/duplicate code.
simian -includes=**/*.c -excludes=**/*_test.c
John Feminella
I'm enjoying the code-colouring of your post
Joe
+1  A: 

Simian (noted earlier) is a good tool for this. I have been using CloneDetective on my project and it works great. CloneDetective is free, so it can't hurt to give it a try.

Mark Ewer
A: 

Be aware that you can't just compare lines of text. You will have to parse the code, in this manner, you could also detect segments that are semantically correct but may have different named identifiers.

For example, given two functions that are equivalent but use different identifiers, a text search will not see them as identical, but a parser can.

Also note that writing a C++ parser is not a trivial task, even when given the grammar. I suggest the advice of others and seek out a tool for this. Also search for refactoring tools.

Thomas Matthews
Thomas is correct: you want a parser, and building parsers is pretty hard for real langauges. See CloneDR answer for a clone detection tool that parses, and handles C and C++.
Ira Baxter
+1  A: 

See CloneDR, a tool for finding exact copy and near-miss (copy-paste-edit) clones in source code. It uses full language parsers to enable it to find clones according to the language structure, minimizing false positives, and to be completely indendent of how the code is commented or formatted, thereby maximing true detection. The CloneDR will find clones when the cloned block has changed variable, inserted statemens or blocks of code.

It has language front ends for C, C++, COBOL, C#, Java, PHP and a number of other langauges.

You can see sample clone detection reports at the website.

Ira Baxter