How much duplicated code do you tolerate?

views:

814

answers:

+15 Q:

How much duplicated code do you tolerate?

In a recent code review I spotted a few lines of duplicated logic in a class (less than 15 lines). When I suggested that the author refactor the code, he argued that the code is simpler to understand that way. After reading the code again, I have to agree extracting the duplicated logic would hurt readability a little.

I know DRY is guideline, not an absolute rule. But in general, are you willing to hurt readability in the name of DRY?

+15 A:

Personally, I prefer keeping code understandable, first and foremost.

DRY is about easing the maintenance in code. Making your code less understandable in order to remove repeated code hurts the maintainability more, in many cases, than having some repeated lines of code.

That being said, I do agree that DRY is a good goal to follow, when practical.

Reed Copsey 2010-02-19 17:22:28

I agree. DRY is a tool to use to achieve maintainability and readability. DRY itself is not the goal.

Ray 2010-02-19 17:24:54

True. I think DRY becomes more and more important when the repeated code grows - there's no point in writing a function for a three-liner, because it just doesn't make things easier, it makes it *more complicated*. This changes, however, if these three lines are repeated tens or hundreds of times :)

OregonGhost 2010-02-19 17:24:59

Can you please say what DRY stands for.

Laykes 2010-02-19 17:25:20

@Laykes: "Don't Repeat Yourself."

OregonGhost 2010-02-19 17:25:38

As long as you have some unit tests (or any other means) to ensure you don't forget to update all repeated portions of code. If you don't have them I'd take the hit on readability.

Vinko Vrsalovic 2010-02-19 17:25:54

DRY - Don't Repeat Yourself

Tony 2010-02-19 17:26:24

could you give some examples? what is in your mind ok to copy-paste and what is not?

Evgeny 2010-02-19 17:26:32

Now that I think about it, it would be better if it was "Diminish Repeating Yourself" ;)

OregonGhost 2010-02-19 17:27:37

I disagree strongly, for the reasons explained in my response to Ira Baxter.

Steven Sudit 2010-02-19 17:32:39

@Steven: Personally, in general, I agree. You should refactor, *if it's going to make the code easier to read*. There are cases, however, where sometimes the refactor does hurt readability, and the OP was suggesting that this is one of those cases. In those (rare) situations, I'd personally leave the dup.

Reed Copsey 2010-02-19 17:39:31

@Reed: Exactly. So rare I haven't found one. This discussion would be a lot more enlightening if Sly would provide us with some code as Samuel says in a comment to the question.

Vinko Vrsalovic 2010-02-19 17:57:03

@Vinko: The only times I've ever seen this are when the total lines of code are VERY short (2-3 lines total), AND you have logic that must run in between the duplicated lines. This can be refactored using a functional style to create DRY, but it tends to escalate the maintenance cost to do so, since it almost always forces introduction of a method pointer/lambda/delegate/etc (depending on language).

Reed Copsey 2010-02-19 18:01:24

@Reed: Certainly, any rule might have an exception, but I've yet to see one for this rule.

Steven Sudit 2010-02-19 18:47:44

@Reed: I think that the maintenance cost of introducing a functional style refactor would only be high on languages where doing it would require the use of tons of boilerplate code.

Vinko Vrsalovic 2010-02-19 22:52:33

If it is something like if(test == -1 || test > 5) then I would just repeat it - but if it is a large piece of code like querying a Database and retrieving a set of data (from different tables), I'd definitely make a subroutine.The cost to refactor a large, full-of-duplicates application is large: In that case, I wouldn't do it. If you fix early, good. Otherwise, screw it. Just leave it duplicated. :P

Jimmie Lin 2010-02-20 11:57:54

@Jimmie: Nice example :-). Even "if(test == -1 || test > 5)" could be a candidate for refactoring. If you could e.g. replace it with a call "checkInputBounds(test)", that would both remove duplication (thus easing maintenance) *and* make it easier to understand, because the method explains the *meaning* of the comparison.

sleske 2010-02-20 14:17:27

@sleske: Agreed. The benefits of encapsulating the predicate is that two magic numbers are hidden away and replaced with some hint at their purpose.

Steven Sudit 2010-02-22 15:10:45

The point of DRY is maintainability. If code is harder to understand it's harder to maintain, so if refactoring hurts readability you may actually be failing to meet DRY's goal. For less than 15 lines of code, I'd be inclined to agree with your classmate.

Darryl 2010-02-19 17:24:52

Why would it hurt readability?

Steven Sudit 2010-02-19 18:05:40

First of all, because the original question said it did. Both he and his classmate agreed.But beyond simply taking the OP at his word, let's look at why that might be. If the driver for what and when you refactor is purely motivated by duplication of code, the repeated code chunk may not map to a nice abstract concept. If you have to locate and examine the code to understand how it affects the calling function, you've added complexity instead of removing it. Duplication by itself isn't reason enough to pull out a chunk of code. Duplication plus a nice concept to wrap it in is.

Darryl 2010-02-23 20:17:16

+3 A:

you didn't say what language but in most IDEs it is a simple Refactor -> Extract Method. How much easier is that, and a single method with some arguments is much more maintainable than 2 blocks of duplicate code.

fuzzy lollipop 2010-02-19 17:24:58

@fuzzy: My point is not that extracting the code would be hard; it would take a few second to do. My point is that it would hurt readability because it would raise the level of abstraction.

Sly 2010-02-19 17:26:52

@Sly: I find it hard to believe you cannot find a name for the function that would make it at least as readable as the original.

Vinko Vrsalovic 2010-02-19 17:30:13

It might hurt readability of the abstracted code, but likely not; its good to know what the variation points are explicitly. But the call site, where it is used, will now say "DoFrobbish(...)" where Frobbish is well understood by the maintainers of the system. That's far more understandable than, "here's 15 lines of code, you guess what it does". (PS: Is it 15 lines? 17? 12? how do you know?) Invoking a well-named abstraction is a big win for the reader.

Ira Baxter 2010-02-19 17:31:52

@Ira Baxter: Except that a random dozen lines or so does not constitute a nameable function. If you can point at the repetition and describe what it says, that's one thing, but breaking programs apart based on code structure is a bad thing.

David Thornley 2010-02-19 17:35:55

@David: Why have you repeated the code if it's not describable or 'function-able'?

Vinko Vrsalovic 2010-02-19 17:37:40

@David: Yes, the block of code has to have an identifiable purpose. See my answer this question. If it does, you should be able to give that purpose a good (perhaps domain-specific) name.

Ira Baxter 2010-02-19 17:39:10

It really depends on many factors, how much the code is used, readability, etc. In this case, if there is just one copy of the code and it is easier to read this way then maybe it is fine. But if you need to use the same code in a third place I would seriously consider refactoring it into a common function.

Justin Ethier 2010-02-19 17:25:18

+8 A:

If the code in question has a clear business or technology-support purpose P, you should generally refactor it. Otherwise you'll have the classic problem with cloned code: eventually you'll discover a need to modify code supporting P, and you won't find all the clones that implement it.

Some folks suggest 3 or more copies is the threshold for refactoring. I believe that if you have two, you should do so; finding the other clone(s) [or even knowing they might exist] in a big system is hard, whether you have two or three or more.

Now this answer is provided in the context of not having any tools for finding the clones. If you can reliably find clones, then the original reason to refactor (avoiding maintenance errors) is less persausive (the utility of having a named abstraction is still real). What you really want is a way to find and track clones; abstracting them is one way to ensure you can "find" them (by making finding trivial).

A tool that can find clones reliably can at least prevent you from making failure-to-update-clone maintenance errors. One such tool (I'm the author) is the CloneDR. CloneDR finds clones using the targeted langauge structure as guidance, and thus finds clones regardless of whitespace layout, changes in comments, renamed variables, etc. (It is implemented for a number a languages including C, C++, Java, C#, COBOL and PHP). CloneDR will find clones across large systems, without being given any guidance. Detected clones are shown, as well as the antiunifier, which is essentially the abstraction you might have written instead. Versions of it (for COBOL) now integrate with Eclipse, and show you when you are editing inside a clone in a buffer, as well as where the other clones are, so that you may inspect/revise the others while you are there. (One thing you might do is refactor them :).

I used to think cloning was just outright wrong, but people do it because they don't know how the clone will vary from the original and so the final abstraction isn't clear at the moment the cloning act is occurring. Now I believe that cloning is good, if you can track the clones and you attempt to refactor after the abstraction becomes clear.

Ira Baxter 2010-02-19 17:25:20

I'd have to agree with you. If the repeated code is repeatedly doing the same thing, whatever it's doing has a name and deserves to be factored out so that it can be seen in terms of the semantics of that name, not the details of the syntax. This simplifies maintenance, and actually makes the code *easier* to read.

Steven Sudit 2010-02-19 17:31:35

sleske 2010-02-20 14:19:51

@Ira: I really think encouraging cloning makes little sense. Your tool is certainly useful, but I don't follow going from there to saying that the existence of such a tool makes cloning desirable. If people do it because they are not sure yet, they might very well create the method and then create a second method, add a parameter, of even put the lines back if warranted, there are alternatives to cloning.

Vinko Vrsalovic 2010-02-20 14:44:34

You can't construct the abstraction until you understand what should be abstracted. If you could look at a block of code and magically know how to produce the abstraction, then you could adopt a purist policy. Since you can't, you realistically have to do this process in stages... of which the first is to actually clone. Now, when you are *done* (and it may not be just that piece you are modifying) you *should* try to abstract. But... your language may not have a suitable abstraction mechanism (COBOL doesn't have generics) ... so if you can't declone, you should track.

Ira Baxter 2010-02-20 16:13:39

... and of course there's the real problem of you facing a legacy system which has clones (this is a certainty, I'll take a bet on 10% minimum on anything over 100K SLOC), and you have to maintain that. So... is that block that you are editing a clone of something? You need tools to find and track the ones you haven't had a chance to hunt down and kill.

Ira Baxter 2010-02-20 16:16:31

Fully agree when the languages don't allow for a suitable abstraction. But when they do, you absolutely can abstract without understanding it fully (potentially wrongly) and then refactor the abstraction when you see it failing. You might even end up concretizing that wrong abstraction if you realize it was a mistake.

Vinko Vrsalovic 2010-02-20 16:22:49

@Ira: I totally agree on the usefulness of your tool (I've already said so.) What I don't agree with is that cloning is to be encouraged.

Vinko Vrsalovic 2010-02-20 16:24:30

@Vinko: "... cloning makes little sense..." Er, how do you stop it? I have said explicitly you should try to declone after copy-paste-edit and there I think we agree. You can try weasel-wording the decloning attempt ("... there are alternatives to cloning...") but that's just decloning in my eyes. But if you fail to declonethere needs to be some way to manage it.

Ira Baxter 2010-02-20 17:09:03

@Ira: You stripped the 'encouraging' part that was my whole point. I think **encouraging** cloning is what makes little sense, which is what I understood you were advocating: "Clone away, with no worries, for the tool will handle it afterwards". You should still worry about it, even with a great clone tracking tool.

Vinko Vrsalovic 2010-02-20 17:12:26

@Vinko: That's because I didn't say "encourage", you did. People clone code because they think they know what it does, and believe they can change it, and believe that's much easier than re-inventing what it does. So there's an abstraction ("what they think it does") and a parameterization ("how it needs to be different"), and work saved. So cloning will continue (whether I encourage it or not). Yes, we agree that one should worry about it.

Ira Baxter 2010-02-20 17:18:10

@Ira: I just mentioned it because you misquoted me. But we seem to be in agreement for the most part. For the record, I've never claimed you could stop it, you can't. But you should take action (be it code reviews, thorough unit tests, tools or all of the above and more.)

Vinko Vrsalovic 2010-02-20 17:24:24

+3 A:

Very difficult to say in abstract. But my own belief is that even one line of duplicated code should be made into a function. Of course, I don't always achieve this high standard myself.

anon 2010-02-19 17:26:22

IF that one line does something in specific, I agree.

Steven Sudit 2010-02-19 17:33:26

You do need to set a lower threshold. a*b is a clone of x*y if you parameterize it, but it isn't an interesting clone. I've been building clone detectors for about 10 years (http://www.semanticdesigns.com/Products/Clone) and my experience suggests that "several lines" is a good threshold for detecting them. I'll agree, however, that if you have a "one line clone" that has enough structure and sufficient business- or technology relevance, you should refactor. My clone detector measures "clone mass" rather than "lines" as a way to come closer to this ideal.

Ira Baxter 2010-02-19 17:37:15

Yeah, "don't duplicate one line" is clearly overkill . . . even if you wrap it in a function, you'll be duplicating the function call, so how is that any better? If adding the function increases the length of your code more than removing the duplication decreases it, it's probably a sign that you shouldn't have bothered.

Tim Goodman 2010-02-19 18:16:05

@Tim : "How is that any better?" It's better because it gives a name to that statement/expression. In fact, for that reason, I often extract a method for single statement even if that statement is not duplicated elsewhere. For instance, I tend to do that a lot with LINQ expressions. Just for the sake of readability.

Sly 2010-02-19 18:23:21

@Tim: Presumably, the one line you're wrapping is complex and contains some non-obvious logic that is better hidden behind an explanatory name. For example, if it looks up a value in a table and evaluates to true when the value is in some range, this would be better as IsLegalSize() or whatver domain-specific name is best.

Steven Sudit 2010-02-19 18:53:14

@Sly, @Steven: Fair enough, I could certainly see how a one line method could enhance readability if the one line is something that's hard to understand. Of course in that case the duplication wasn't really the problem (although I suppose having a hard to understand line occur twice is twice as bad as having it occur once...)

Tim Goodman 2010-02-19 19:33:53

@Tim: Good point. The real issue isn't the presence of duplication but the absence of encapsulation.

Steven Sudit 2010-02-22 15:12:41

+27 A:

Refactoring: Improving the Design of Existing Code

The Rule of Three

The first time you do something, you just do it. The second time you do
something similar, you wince at the duplication, but you do the duplicate
thing anyway. The third time you do something similar, you refactor.

Three strikes and you refactor.

Coders at Work

Seibel: So for each of these XII calls you're writing an implementation.
Did you ever find that you were accumulating lots of bits of very similar code?

Zawinski: Oh, yeah, definitely. Usually by the second or third time you've cut and pasted
that piece of code it's like, alright, time to stop cutting and pasting and put it in a subroutine.

Nick D 2010-02-19 17:32:59

+1 - Also, the rule of three isn't just a heuristic, it's something that's been studied. It applies to small redundancies as well as whole frameworks (Product Line driven development). Your payoff is actually right around the 2.5 mark for getting back what it costs to do reuse vs. copy/hack.

Chris Kessel 2010-02-19 17:35:21

I believe it should be the "rule of two". See my answer to this OP's question.

Ira Baxter 2010-02-19 17:40:11

@Ira Baxter, the 3rd time you'll need to duplicate ... time to refactor.

Nick D 2010-02-19 17:45:18

The first time isn't a duplicate. The second time is. The third time would be the second duplicate, which is the hint that we should instead refactor. The key here is that you don't *always* want to try to break something out to make it reusable when you've only used it once.

Steven Sudit 2010-02-19 18:56:44

Thus the tempered need to abstract, and the need to *track* where the code clones are. If you can do that, you don't need to abstract as much. If you can't do that, you'd better abstract on the second one, because you'll never find that second clone manually when the first needs updating in two years.

Ira Baxter 2010-02-19 19:47:27

As in *Planescape*, rules of three applies. At the first time there is usually no time to do that. At the second time, there usually won't be enough time to debug one thing for three times. That's when I refactor.

gruszczy 2010-02-19 22:38:26

The three times are really true - I do it once, do it twice, and on the third time, I say, "Screw it, time to refactor."

Jimmie Lin 2010-02-20 11:54:55

@Chris: This rule makes no sense to me, unless coupled with some unit tests to go or the tool Ira has built. Do you have any reference for the studies?

Vinko Vrsalovic 2010-02-20 11:57:41

Readability is one of the most important things code can have, and I'm unwilling to compromise on it. Duplicated code is a bad smell, not a mortal sin.

That being said, there are issues here.

If this code is supposed to be the same, rather than is coincidentally the same, there's a maintainability risk. I'd have comments in each place pointing to the other, and if it needed to be in a third place I'd refactor it out. (I actually do have code like this, in two different programs that don't share appropriate code files, so comments in each program point to the other.)

You haven't said if the lines make a coherent whole, performing some function you can easily describe. If they do, refactor them out. This is unlikely to be the case, since you agree that the code is more readable embedded in two places. However, you could look for a larger or smaller similarity, and perhaps factor out a function to simplify the code. Just because a dozen lines of code are repeated doesn't mean a function should consist of that dozen lines and no more.

David Thornley 2010-02-19 17:33:38

Arguably, you can refactor out the dozen lines, then further refactor the new method to break it up further to reflect its internal cohesion.

Steven Sudit 2010-02-19 18:54:51

Or factor out a larger number of lines and use parameters, perhaps. However, a dozen lines by themselves do not necessarily have much cohesion.

David Thornley 2010-02-19 19:05:43

+22 A:

I tolerate none. I may end up having some due to time constraints or whatnot. But I still haven't found a case where duplicated code is really warranted.

Saying that it'll hurt readability only suggests that you are bad at picking names :-)

Vinko Vrsalovic 2010-02-19 17:33:46

+1 for the comment about picking good names. It's hard for me to see how calling a method with a sensible name would be less readable than a dozen lines of code . . . especially because when you see the same method called a second time you don't have to read a bunch of lines before deciding "Yep, same thing as before"

Tim Goodman 2010-02-19 18:10:37

I tolerate none either. Duplicated code will always haunt you in the future.

GmonC 2010-02-19 18:15:23

+1 for "picking bad names". Addition: It may also be because you picked an unfortunate refactoring. Try to refactor out a part that is meaningful on its own (as far as possible), then refactoring will actually *improve* readability (by hiding implementation details behind a nice method name).

sleske 2010-02-20 14:14:39

I would like to tolerate none, but real world budget constraints of how much time I can spend on a project can dictate how much refactoring I can do. The alternative is to refactor code for free.

Peter M 2010-02-20 20:12:57

@Peter M: Exactly. Many of my projects I can't tolerate. The fact that they work and are in production doesn't change that. I'd love to improve them.

Vinko Vrsalovic 2010-02-20 20:40:58

Perhaps the reason it's hard to find a name is because it doesn't map to a nice, packageable concept. Perhaps the repeated code isn't a sensible stand-alone unit of functionality. It hurts readability if you have to track down the code before reading it, and you have to read it because you split your code into chunks that don't make sense. As sleske said, refactor based on what is meaningful, not *just* what is duplicated.

Darryl 2010-02-23 20:24:06

+1 A:

In general, no. Not for readability anyway. There is always some way to refactor the duplicated code into an intention revealing common method that reads like a book, IMO.

If you want to make an argument for violating DRY in order to avoid introducing dependencies, that might carry more weight, and you can get Ayende's opinionated opinion along with code to illustrate the point here.

Unless your dev is actually Ayende though I would hold tight to DRY and get the readability through intention revealing methods.

Berryl 2010-02-19 17:35:38

+5 A:

As soon as you repeat anything you're creating multiple places to have make edits if you find that you've made a mistake, need to extend it, edit, delete or any other of the dozens of other reasons you might come up against that force a change.

In most languages, extracting a block to a suitably named method can rarely hurt your readability.

It is your code, with your standards, but my basic answer to your "how much?" is none ...

Unsliced 2010-02-19 17:53:52

+2 A:

Refactoring can be difficult, and this depends on the language. All languages have limitations, and sometimes a refactored version of duplicated logic can be linguistically more complex than the repeated code.

Often duplications of code LOGIC occur when two objects, with different base classes, have similarities in the way they operate. For example 2 GUI components that both display values, but don't implement a common interface for accessing that value. Refactoring this kind of system either requires methods taking more generic objects than needed, followed by typechecking and casting, or else the class hierarchy needs to be rethought & restructured.

This situation is different than if the code was exactly duplicated. I would not necessarily create a new interface class if I only intended it to be used twice, and both times within the same function.

Sanjay Manohar 2010-02-19 22:55:20

Well I don't know. How much duplicate code can you tolerate?

Tim Post 2010-02-20 11:39:13

I accept NO duplicate code. If something is used in more than one place, it will be part of the framework or at least a utility library.

The best line of code is a line of code not written.

Turing Complete 2010-07-06 08:29:18

ansaurus

tags:

views:

answers:

How much duplicated code do you tolerate?

related questions