views:

313

answers:

4

I'm looking for the best approach to dealing with duplicate code in a legacy PHP project with about 150k lines of code.

Is this something best approached manually or are there standalone duplicate code detectors that will ease the pain?

A: 

Maybe this grep thread can help you?

Sergii
+2  A: 

As the other answers already mention, this should be approached manually, because you may want to change other things as you go along to make the code base cleaner. Maybe the actual invocation is already superfluous, or similar fragments can be combined.

Also, in practice people usually slightly change the copied code, so there will often not be direct duplicates, but close variants. I fear automatic c&p detection will mostly fail you there.

There are however refactoring tools that can help you with acutally performing the changes (and sometimes also with finding likely candidates). Google for "php refactoring", there are quite a few tools available, both standalone and as part of IDEs.

sleske
Automatic detection can find most "near misses". See CloneDR answer to this question.
Ira Baxter
+1  A: 

Please also take into account the process that lead to this code duplication!

If you have to change code, it's most of the times faster to duplicate the code than to refactor it so it can be used for your new purpose as well as for your old purpose.

So you have to convince people that refactoring is better than simple duplicating as it saves time in the long run instead of the short term.

Otherwise you find yourself in two years googling to this very same question you posted here!

Roalt
I'm not sure who downvoted me, but I do not recommend duplicating code here! I'm just telling why people duplicate their code and that you must prevent them from doing this.
Roalt
+1  A: 

The CloneDR finds duplicate code, both exact copies and near-misses, across large source systems, parameterized by langauge syntax. For each detected set of clones, it will even propose a sketch of the abstraction code that could be used to replace the clones.

It is available for many langauges, including PHP system. A sample PHP clone detection report for Joomla (a PHP framework) can be found at the link.

Ira Baxter