tags:

views:

172

answers:

4

Code duplication is usually bad and often quite easy to spot. I suppose that compilers could detect it automatically in easiest cases - they already parse the text and get the intermediate representation that they analyze in various ways - detect suspicious patterns like uninitialized variables, optimize emitted code, etc. I guess they could often detect functionally duplicate code this way as well and account for it while emitting machine code.

Are there C++ compilers that can detect duplicate code and only emit corresponding machine code once instead of for each duplicate in the source text?

+8  A: 

I think the question makes the false assumption that compilers would always want to eliminate code duplication. code duplication is bad for readability/maintainability of source code not necesarily performance of compiled code, indeed one could consider loop unrolling as a compiler adding duplicate code to increase speed. compiled code does not need to follow the same principles as source code and generally doesn't as it is for the machine not for humans to read.

generally compilers are busy compiling not transforming source code, of course IDEs may allow both.

jk
Merging code reduces the size of the produced library / executable, so it can speed up execution too. I agree it would conflict somewhat with loop unrolling though.
Matthieu M.
as with most things there is a trade off, but imho the assumption that the compiler should always try to eliminate duplication is false
jk
I don't see where this question makes such an assumption at all. It's asking whether any compilers do it, not whether it's a good idea, nor whether the compilers that *can* do it always *will* do it.
Rob Kennedy
+5  A: 

Some do, some don't.

From the LLVM optimization's page: -mergefunc

The functions are separated in small blocks in the LLVM Intermediate Representation, this optimization pass tries to merge similar blocks. It's not guaranteed to succeed though.

You'll find plenty of other optimizations on this page, even though some of them may appear cryptic at first glance.

I would add a note though, that duplicate code isn't so bad for the compiler / executable, it's bad from a maintenance point of view, and there is nothing a compiler can do about it.

Matthieu M.
i would think even if it succeeds it isnt guaranteed to improve speed. of course some people need to optimize for space anyway +1
jk
Interesting. I don't know what they mean by "overridable", but you can't generally do this with C functions, since equality is defined over function pointers.
Potatoswatter
@jk: I agree that speed may not be improved, I think this goes against the traditional loop unrolling optimization. However you can with the LLVM framework specify in which order you'd like to apply optimizations, so you could have the twos. They do not precise if a kind of heuristic is applied (depending on the size of the code ?). On the one hand it incurs a jump to another memory location, on the other the less code the more likely it is to be in cache... so I guess that, once more, you just need to measure for your own little piece of code :)
Matthieu M.
+1  A: 

Visual C++ does this if you specify 'minimize code size' (/O1). The function provided is described in the docs for /Og, which is deprecated in favour of simpler catch-all options to favor size or favor speed (/O2).

Steve Townsend
+1  A: 

From my knowledge, the code elimination does not usually happen across the functions. So if you write some duplicate piece of code in two different functions there are very less chances(close to none) that piece of code will be eliminated.

There are some optimizations like return value optimization, function inlining which can happen across functions. However most of the optimization is done within the function itself.This is not usually done at the higher language level, by this i mean that the compiler wont look at the C++ code and start optimizing it. Compilers mostly have an intermediary representation, between high level language(C++) and machine language. This intermediary representation(IR) is somewhat similar to machine language but is not exactly the machine language of the system on which code is compiled. Refer to the wiki page http://en.wikipedia.org/wiki/Compiler_optimization, it lists some of those optimizations

Yogesh Arora