views:

125

answers:

4

I have a single PHP file within a legacy project that is at least a few thousand lines long. It is predominantly separated up into a number of different conditional blocks by a switch statement with about 10 cases. Within each case there is what appears to be a very similar - if not exact duplicate - block of code. What methods are available for me identifying these blocks of code as being either the same - or close to the same - so I can abstract that code out and begin to refactor the entire file? I know this is possible in very manual terms (separate the code into files and Diff) but i'm interested in what tools i could be using to speed this up.

Thanks.

+3  A: 

You can use phpunit PMD (Project Mess Detector) to detect duplicated blocks of code.

It also can compute the Cyclomatic complexity of your code.

Here is a screenshot of the pmd tab in phpuc: pmd tab

greg0ire
Cyclomatic Complexity has nothing to do with Copy and Pasted code. And looking at the docs for [PMD](http://phpmd.org/rules/index.html), I'd say it cannot detect such duplicate code. It is without a doubt a good tool though.
Gordon
I updated my post, I think it is clearer now. I also think phpunit-pmd uses phpcpd, doesn't it? Or is it another implementation?
greg0ire
i'll look at this too - thanks
seengee
I might have been confused by the tab label in this (great) UI, which might call several tools.
greg0ire
it definitely does. but checkout [hudson](http://www.whitewashing.de/blog/126) and [arbit](http://www.arbitracker.org/news.html) for alternatives.
Gordon
Thanks for these precisions. Adding this post to my favorites :-)
greg0ire
A: 

You could put the blocks in separate files and just run diff on them?

However, I think in the end you will need to go through everything manually anyway, since it sounds like this code requires a lot of refactoring, and even if there are differences you will probably need to evaluate whether this is intentional or a bug.

mikera
thats kinda what i meant by a manual method. thanks for the input though
seengee
+9  A: 

You can use phpcpd.

phpcpd is a Copy/Paste Detector (CPD) for PHP code. It scans a PHP project for duplicated code.

Further resources:

Gordon
+1. I'm gonna have fun with the tools mentioned in that link!!!!
Spudley
that looks like a great starting point and a really handy tool. thanks
seengee
+1 great link!!
greg0ire
would the downvoter please enlighten me about the reason?
Gordon
+1  A: 

See our PHP Clone Detector tool.

This finds both exact copies and near misses, in spite of reformatting, insertion/deletion of comments, replacement of variable names, addition/replacments of subblocks etc.

PHPCPD as far as I can tell finds only sequences which are exactly the same. That misses a lot of clones, since the most common operation after copy-paste is edit-to-customize. So it would miss the very clones the OP is trying to find.

Ira Baxter