views:

1064

answers:

20

I'm currently reviewing a very old C++ project and see lots of code duplication there.

For example, there is a class with 5 MFC message handlers each holding 10 identical lines of code. Or there is a 5-line snippet for a very specific string transformation every here and there. Reducing code duplication is not a problem in these cases at all.

But I have a strange feeling that I might be misunderstanding something and that there was originally a reason for this duplication.

What could be a valid reason for duplicating code?

+14  A: 

Laziness, that's the only reason I can think of.

On a more serious note. The only valid reason I can think of is changes at the very end of the product cycle. These tend to undergo a lot more scrutiny and the smallest change tends to have the highest rate of success. In that limited circumstance it is easier to get a code duplication change through as opposed to refactoring out a smaller change.

Still leaves a bad taste in my mouth.

JaredPar
Its kind of reverse laziness though isn't it? I mean, it would be far lazier to make a function and call it everywhere...
Justicle
@Justicle True, but when you just want to finish that one function and try your code, its easier to add the 5 lines of code there, rather than think about passing parameters/return types and all the other stuff that comes with a function.
DeusAduro
Jeremy Friesner
+3  A: 

The only "valid" thing I can see this arising from is when those lines of code were different, then converged to the same thing through subsequent edits. I've had this happen to me before, but none too frequently.

This is, of course, when it's time to factor out this common segment of code into new functionality.

That said, I can't think of any reasonable way to justify duplicate code. Look at why it's bad.

It's bad because a change in one place requires a change in multiple places. This is increased time, with a chance of bugs. By factoring it out, you maintain the code in a single, working location. After all, when you write a program you don't write it twice, why would a function be any different?

GMan
A: 

There is no good reason for code duplication.

See the Refactor Mercilessly design pattern.

The original programmer was either in a hurry to meet a deadline or lazy. Feel free to refactor and improve the code.

Asaph
-1, While this asker's case is not a good reason, and almost every other case of duplication isn't either, of course there are valid reasons (see the responses).
orip
+3  A: 

For that kind of code duplication (lots of lines duplicated lots of times), I'd say :

  • either laziness (you just paste some code here and there, without having to worry about any impact it could have on other parts of the application -- while writing a new function and using it in two places could, I suppose, have some impact)
  • or not knowing any good practice (re-using code, separating different tasks in different functions/methods)

Probably the first solution, though, from what I've generally seen :-(

Best solution I've seen against that : have your developpers start by maintaining some old application, when they are hired -- that'll teach them that this kind of thing is not good... And they will understand why, which is the most important part.

Splitting code into several functions, re-using code the right way, and all that often come with experience -- or you have not hired the right people ;-)

Pascal MARTIN
In my experience, junior developers who started maintaining ugly code (and didn't get to see any good code) have only acquired a "patch it 'till it works" mentality.
Wim Coenen
@wcoenen Sounds like you need to recruit better junior developers. A small touch of OCD can be a good thing.
mLewisLogic
+2  A: 

A long time ago when I used to do graphics programming you would, in some special cases, use duplicate code this way to avoid the low level JMP statements generated in the code (it would improve performance by avoiding the jump to the label/function). It was a way to optimize and do a pseudo "inlining".

However, in this case, I don't think that's why they were doing it, heh.

jinxx0r
+1  A: 

Sounds like the original author either was inexperienced and/or was hard pressed on time. Most experienced programmers bunch together things that are reused because later there will be less maintenance - a form of laziness.

The only thing you should check is if there are any side effects, if the copied code accesses some global data a bit refactoring may be needed.

edit: back in the day when compilers were crappy and optimizers even crappier it could happen that due to some bug in the compiler one may had to do such a trick in order to get around a bug. Maybe its something like that? How old is old?

Anders K.
+12  A: 

You might want to do so to make sure that future changes in one part will not unintentionally change the other part. for example consider

Do_A_Policy()
{
  printf("%d",1);
  printf("%d",2);
}

Do_B_Policy()
{
  printf("%d",1);
  printf("%d",2);
}

Now you can prevent "code duplication" with function like this:

first_policy()
{
printf("%d",1);
printf("%d",2);
}

Do_A_Policy()
{
first_policy()
}

Do_B_Policy()
{
first_policy()
}

However there is a risk that some other programmer will want to change Do_A_Policy() and will do so by changing first_policy() and will cause the side effect of changing Do_B_Policy(), a side effect which the programmer may not be aware of. so this kind of "code duplication" can serve as a safety mechanism against this kind of future changes in the program.

Liran Orevi
Well, sounds to me like `first_policy` would need to take a parameter of sorts.
GMan
This example screams for a unit test.
John MacIntyre
+5  A: 

Sometimes methods and classes which domain-wise have nothing in common, but implementation-wise looks a lot alike. In these cases it's often better to do code duplication as future changes more often that not will branch these implementations into something that aren't the same.

cwap
can you give a real life example for such a situation?
flybywire
+2  A: 

If different tasks are similar by accident, repeating the same actions in two places is not necessarily duplication. If the actions in one place change, is it probable they should change in other places as well? Then this is duplication you should avoid or refactor away.

Also, sometimes - even when logic is duplicated - the cost of reducing duplication is too high. This can happen especially when it's not just code duplication: for example, if you have a record of data with certain fields that repeats itself in different places (DB table definition, C++ class, text-based input), the usual way to reduce this duplication is with code generation. This adds complexity to your solution. Almost always, this complexity pays off, but sometimes it doesn't - it's your tradeoff to make.

orip
+1  A: 

On large projects ( those with a code-base as large as a GB ) it's quite possible to lose existing API. This is typically due to insufficient documentation, or an inability of the programmer to locate the original code; hence duplicate code.

Boils down to laziness, or poor review practice.

EDIT:

One additional possibility is that there may have been additional code in those methods which was removed along the way.

Have you looked at the revision history on the file?

Everyone
+10  A: 

Besides being inexperienced, there is why duplicated code occurrences might show up:

No time to properly refactor

Most of us are working in a real world where real constraints force us to move quickly to real problems instead of thinking about niceness of the code. So we copy&paste and move on. With me, if I later see that code is duplicated several more times, it is the sign that I have to spend some more time on it and converge all instances to one.

Generalization of the code not possible/not 'pretty' due to language constraints

Lets say that deep inside a function you have several statements that greatly differ from instance to instance of same duplicated code. For example: I have a function that draws 2d array of thumbnails for the video, and it's embedded with calculation of each thumbnail position. In order to calculate hit-test (calculate thumbnail index from click position) I am using same code but without painting.

You are not sure that there will be generalization at all

Duplicate code at first, and later observe how it will evolve. Since we are writing software, we can allow 'as late as possible' modifications to the software, since everything is 'soft' and changeable.

I'll add more if I remember something else.

Daniel Mošmondor
+1 for your last point - it's the same principle as premature optimisation.
Andrzej Doyle
+4  A: 

The valid reason I can think of: If the code gets alot more complex to avoid the duplication. Basically that's the place when you do almost the same in several methods - but just not quite the same. Of course - you can then refactor and add special parameters including pointers to different members that have to be modified. But the new, refactored method may get too complicated.

Example (pseudocode):

procedure setPropertyStart(adress, mode, value)
begin
  d:=getObject(adress)
  case mode do
  begin
    single: 
       d.setStart(SingleMode, value);
    delta:
       //do some calculations
       d.setStart(DeltaSingle, calculatedValue);
   ...
end;

procedure setPropertyStop(adress, mode, value)
begin
  d:=getObject(adress)
  case mode do
  begin
    single: 
       d.setStop(SingleMode, value);
    delta:
       //do some calculations
       d.setStop(DeltaSingle, calculatedValue);
   ...
end;

You could refactor out the method call (setXXX) somehow - but depending on the language it could be difficult (especially with inheritance). It is code duplication since most of the body is the same for each property, but it can be hard to refactor out the common parts.

In short - if the refactored method is factors more complicated, I'd go with code duplication although it is "evil" (and will stay evil).

Tobias Langner
+1 - the important part here is "depending on the language", but I agree that it can happen even in simple cases like this.
orip
+1  A: 

All the answers looks right, but I think there is another possibility. Maybe there are performance considerations as the things you say reminds me "inlining code". It's always faster to inline functions that to call them. Maybe the code you look at has been preprocessed first?

SoMoS
Modern langauges let you write functions, and then hint loudly they should be inlined. This gives you the best of both worlds: inlining and avoidance of redundant code.
Ira Baxter
+25  A: 

A good read about this is large scale c++ software design by John Lakos.

He has many good points about code duplication, where it might help or hinder a project.

The most important point is asking when deciding to remove duplication or duplicate code:

If this method changes in the future, do I want to change the behaviour in the duplicated method, or needs it to stay the way it is?

After all, methods contain (business) logic, and sometimes you'll want to change the logic for every caller, sometimes not. Depends on the circumstances.

In the end, it's all about maintenance, not about pretty source.

Sam
Agreed. People get paranoid about structural changes, especially if it already works and releases are looming.
mLewisLogic
"In the end, it's all about maintenance, not about pretty source." <-- Mucho importante!! :)
cwap
The harder question in a million line system is, "is there any duplicate of the code which I'm changing?". If you can't answer this question, asking about modifying such blocks is well... pointless. What is needed is a tool that finds duplicated code so that you can make such decision. See http://www.semanticdesigns.com/Products/Clone for a tool that finds duplicate code across very big systems.
Ira Baxter
+1  A: 

I have no problems with duplicated code when it is produced by a source code generator.

Stephen C
+10  A: 

When I first started programming, I wrote an app where I had a bunch of similar functionality which I wrapped up in a neat little 20-30 line function ... I was very proud of myself for writing such an elegant piece of code.

Shortly after, the client changed the process in very specific cases, then again, then again, then again , and again, and again .... (many many more times) My elegant code turned into a very difficult, hackish, buggy, & high maintenance mess.

A year later, when I was asked to do something very similar, I deliberately decided to ignore DRY. I put together the basic process, and generated all duplicate code. The duplicate code was documented and I saved the template used to generate the code. When the client asked for specific conditional change (like, if x == y^z + b then 1+2 == 3.42) it was a piece of cake. It was unbelievably easy to maintain & change.

In retrospect, I probably could have solved many of these problems with function pointers and predicates, but using the knowledge I had at the time, I still believe in this specific case, this was the best decision.

John MacIntyre
+1. Changing requirements are often a killer for DRY.
Konrad Rudolph
Changing requirements are a killer. DRY isn't special.
Ira Baxter
+2  A: 

I don't know of many good reasons for code duplication, but rather than jumping in feet first to refactoring, it's probably better to only refactor those bits of the code that you actually change, rather than altering a large codebase that you don't yet fully understand.

Paddy
+1  A: 

Something that we found that forced us to duplicate code was our pixel manipulation code. We work with VERY large images and the function call overhead was eating up on the order of 30% of our per-pixel time.

Duplicating the pixel manipulation code gave us 20% faster image traversal at the cost of code complexity.

This is obviously a very rare case, and in the end it bloated our source significantly (a 300 line function is now 1200 lines).

Ron Warholic
Have you tried to use some compiler directive to force it inline the function calls?
sharptooth
We tried the inline keyword but it wasn't reliably inlining the function in all the cases we needed.
Ron Warholic
A: 

in my humble opinion there's no place for code duplication. have a look, for example, at this wikipedia article

or, let's refer to Larry Wall's citation:

"We will encourage you to develop the three great virtues of a programmer: laziness, impatience, and hubris."

it is pretty clear that code duplication has nothing to do with "laziness". haha;)

varnie
Really? Due to the magic of copy/paste, spamming "HAHAHAHAHAHAHAHAHAHAHAHAHAHAHA" in the comment box is much easier than writing something thoughtful. Duplicate code is easy to write. It is often the lazy solution.
jalf
atamanroman
A: 

Since there is the "Strategy Pattern", there is no valid reason for duplicate code. Not a single line of code must be duplicated, everything else is epic fail.

Turing Complete