views:

1015

answers:

18

How fanatic are you about elimination of duplicate code?

Personally, whenever I see duplicate code, either in testing code or production, I tend to refactor the duplication away. The only exception I make are these:

  1. Sometimes the reduction in duplication is very minimal, because the newly refactored method have too many parameters to be actually useful / readable.
  2. Sometimes, in test code, when several tests use the same piece of code that's not really a coherent flow I leave the duplication alone (but not always - depending on the dup size).
+11  A: 

I consider myself an extreme fanatic when it comes to eliminating code duplication. As long as we are not at a critical point in a milestone, I do my best to remove any duplicate code I come across in my code. Eventually I simply run out of time and have to leave the code alone for the next milestone. In that case though, I often check in at least a comment noting the duplication and what needs to be done to remove it.

JaredPar
While Huffman coding may seem the ultimate goal one might like to view minimisation of duplication as a strong principle rather than an absolute rule.
PP
+1  A: 

Code duplication can quickly bite you in the behind and cause you a lot of pain. If I see duplicate code (usually old code from other people of course ;)) I try to refactor it right away. It is very rare that it is not worth the effort. Spend the time now or you'll spend more time doing it later.

Jason Down
+13  A: 

Avoid factoring down code where the configuration parameters (needed to alter the behaviour) obsfucate the intent of the code. Go as far as you can before you hit this point ... but it`s a balancing act.

Nerdfest
+1  A: 

I was pretty fanatical to begin with, but a recent experience has probably made me more so, and given me another set of tools to use. Specifically, algorithms/concepts from bio-informatics. In a new position we are updating the web UI to use CSS driven layout instead of tables, so I am analyzing 700 existing JSP files. I put all the lines of code into a database and of 100K total lines, fewer than 20K were unique. Then I represent each file as a sequence of line ids, and find the common subsequences of 2 or more lines; the longest was almost 300 lines duplicated between a couple of JSP files, and egregious case of cut and past. That's where I stand now, but my next plan is to re-represent the files as a sequence of line_id's OR (common) subsequence_id's, sort them, and then perform a Levenshtein comparison of files contiguous to one another within the sort order. This should help in fuzzy matching of files that not just contain common subsequences, but subsequences that are off by one and such.

George Jempty
See CloneDR answer in this set of answers. It can handle JSP.
Ira Baxter
+1  A: 

I'm a big believer in DRY coding. Don't Repeat Yourself. If you do it more than once, put it in a helper class.

There is very little that is worse than having to remember to make changes to the same thing in several places.

Echostorm
helper classes, theres another smell... :)
Bjorn Reppen
+1  A: 

As mentionned in the Rewrite or Repair question, you might do some refactoring like duplication code removal from time to time, as you detect them.

But I believe this kind of "rewrite" action is better managed when detected by a metric, from a code static analysis tool, which:

  • detects those duplicate codes
  • points out a trend (like more and more duplicated code tends to be detected)

In that case, a corrective action can be prioritized, to focus on that kind of refactoring.

On second thought, I wonder if I may be the QA guy Zig is referring to ;-)

VonC
lol.. I never "liked" the QA guys.. In a good way.. :)
ZiG
NB: ZiG deleted his answer. Paraphrasing, it was something like: "I work hard to remove all the duplicated code from my module, and then a QA guy comes along and says, 'Hey! What do you need that for'".
Jonathan Leffler
A: 

Normalize code, normalize variables, normalize database, normalize corporate hierarchy, normalize government...

rmeador
normalize interfaces
Bill K
+1  A: 

Since duplication lends to copy-paste I always try to avoid duplication or refactor where there is already duplication in existing code.

rich
+13  A: 

As has been said, I try to live by the "DRY" principle - but I'd also say that there is another condition where I'm often reluctant to eliminate duplication:

  • Don't modify code for which you don't have (or can't economically/practically develop) a unit test.

That set of tests would include the calling code for any method extracted.

If I can't test it, I can't really say I haven't introduced a defect. With a test suite, I can at least say it did what it used to.

Ken Gentle
+1  A: 

I consider it the single most important indicator of a good programmer. If you can write fully factored code--then almost by definition it's good code.

It seems like nearly all other programming practices are just ways to get your code DRY.

This is a bit of an overstatement, but not too much. Between being DRY and making your interfaces as stable and minimal as possible (Separation of Concerns) you're on your way to being an actual Software Engineer and not a Programmer/Hacker..

Bill K
+1  A: 

VERY. As far as i'm concerned almost all of our development tips, sayings and "best practices" stem from the principle of not repeating code and making it reusable. MVC, decorator, OOP etc etc.

Obviously one needs to maintain a sense of pragmatism at times but I tend to lean very heavily towards DRY.

Quibblesome
+4  A: 

I used to be pretty laissez faire about it - obviously try to avoid duplication where ever possible, but if you need to just copy the occasional 15 lines of code from here to there to save an afternoon's refactoring, that's probably okay as long as you don't make a habit of it.

Then I started my current job.

The guy who wrote most of this codebase before me took the "premature optimization is the root of all evil" line to it's ludicrous extreme. Example: there were at least five different places in the app that computed the size for a thumbnail of an uploaded graphic. This seemed like the kind of thing I could rationalize, until I realized that thumbnails from all 5 "paths" were being displayed on the same screen - and each function was doing the math a slightly different way, and getting slightly different results. They had all started as copy-pastes, but had each been hot-rodded over a year or so until we got to where I found it.

So, THAT all got refactored. And now I'm a dupe-removal fanatic.

Electrons_Ahoy
+9  A: 

I have always adhered to the principle that the first duplication (i.e. original plus one copy) is not usually worth the effort to remove. This is because original plus one copy is probably a "one off", and you don't gain enough from removing them to justify the work.

However, as soon as I start to make a second copy, I then rewrite all three to remove the duplication. That's because it has now (in my opinion) moved from "one off" to "trend". It becomes more likely that I'll use the code again so the effort to remove the duplicates is now worth it.

I hesitate to call the process "refactoring", because that's a buzzword from the XP camp, and I was doing this back in the early '80s with FORTRAN and C.

Good programming practice is ageless (and usually trendy-less, too).

Cheers,

-Richard

Huntrods
dsimcha
I don't know why downvoted either, as no comment was left. But I didn't delete my response simply because it's how I code - whether that pleases everyone or not. :-)Cheers -R
Huntrods
i very much agree with dsimcha. It's hard to predict 100% which way the trend will go.
gnomixa
+6  A: 

I always try to first think why this code is duplicated. Most of the time, the answer is lazyness/ignorance/etc, and I refactor. But, just once in a while, the case where the duplication is actually valid shows up. What I am talking about here is two pieces of code that are semantically unrelated, but just happen to have the same (or similar) implementation at the moment. Consider for instance business rules for completely unrelated (actual) processes. The rules may be equal one day, and then the next day one of them changes. You better hope that they are not represented by the same piece of code, or pray that the developer doing the modification can spot what is going on (unit tests, anyone?)).

Eyvind
Ughm. Refactor, call the common code from both. When business rules deviate, remove the function call.
igor
+1 There are places in my code where I specifically copy and paste sections, with maybe just changes in variable and function names, because these are business rules and behaviour sections, and just because the loop to sum all the checks in the system is the same as the loop to sum all the credit cards, that doesn't mean they're the same and should be changed at the same time.
Miki Watts
+2  A: 

I"m pretty much a psycho about it. If I do something more than once I refactor. period. exclamation point.

Sara Chipps
+4  A: 

We work at it. It really helps to have a tool that detects such duplication; regardless of best intentions, it happens because one doesn't think, or there's a tight schedule, etc.

The CloneDR finds duplicate code, both exact copies and near-misses, across large source systems, parameterized by langauge syntax. It supports Java, C#, COBOL, C++, PHP and many other languages. We use it ourselves to help manage our own code.

Ira Baxter
How does it compare to TeamCity's dup-detection?
ripper234
+1 for mentioning a tool and your own experience with it.
Peter Mortensen
@ripper234: The Team City docs aren't very clear about the capabilities of their detector. It appears to find exact duplicates, and find near-misses in which single tokens are replaced by other single tokens. CloneDR will find exact duplicates, and will find near-misses in which one code structure is replaced by another (eg., an entire expression is replaced by a completely different expression), and will determine when a single substituton of a complex code structure can be substited multiple times. It will also find clone sets with arbitrary (e.g. 1000x)replication.
Ira Baxter
A: 

Duplicated code seems to be a problem only in some industries. According to this cross industry survey of code duplication, the enterprise/financial sector is the one who suffers most with up to 40% duplication. Surprisingly, the embedded sector doesn't do very well either with up to 20% cloned code.

Lucian Voinea
Maybe becouse in each case a program is made up of many bits (forms) that need retesting if they are changed but don't depend match on the other bits. So the cost of refactoring is high
Ian Ringrose
A: 

I'm sometimes guilty of copy-pasting, but I try to eliminate duplication wherever possible. The exception is when I have a function which calls several other functions and is very slow. Sometimes the contents of the sub-functions can be combined for speed, or the underlying SQL queries can be combined into fewer or just one.

Example: In inventory management, the minimum quantity on hand for an item is equal to the quantity in your working reserve plus the quantity in safety stock. Safety stock is equal to half of the working reserve.

Function workingReserve(itemCode, someDate)
     ' Inside here is a looping structure that adds up the requirements for the next six weeks (because that's how long it takes between reorder and receipt).
End Function

Function safetyStock(itemCode, someDate)
    safetyStock = workingReserve(itemCode, someDate) / 2
End Function

Function minimumOnHand(itemCode, someDate)
    minimumOnHand = workingReserve(itemCode, someDate) + safetyStock(itemCode, someDate)
End Function

I apologize that this is written in VB, but it's from an Excel VBA function.

In effect, the workingReserve function is running twice on the same data. The performance can be improved by combining the business logic of the safetyStock() function in the minimumOnHand() function.

Function minimumOnHand(itemCode, someDate)
    minimumOnHand = workingReserve(itemCode, someDate) * 1.5
End Function

In the real code, I have comments in the code explaining the business logic, but have omitted them here for brevity.

Scott