I always wonder why compilers can't figure out simple things that are obvious to the human eye. They do lots of simple optimizations, but never something even a little bit complex. For example, this code takes about 6 seconds on my computer to print the value zero (using java 1.6):

int x = 0;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
    x += x + x + x + x + x;

It is totally obvious that x is never changed so no matter how often you add 0 to itself it stays zero. So the compiler could in theory replace this with System.out.println(0).

Or even better, this takes 23 seconds:

public int slow() {
    String s = "x";
    for (int i = 0; i < 100000; ++i) {
        s += "x";
    return 10;

First the compiler could notice that I am actually creating a string s of 100000 "x" so it could automatically use s StringBuilder instead, or even better directly replace it with the resulting string as it is always the same. Second, It does not recognize that I do not actually use the string at all, so the whole loop could be discarded!

Why, after so much manpower is going into fast compilers, are they still so relatively dumb?

EDIT: Of course these are stupid examples that should never be used anywhere. But whenever I have to rewrite a beautiful and very readable code into something unreadable so that the compiler is happy and produces fast code, I wonder why compilers or some other automated tool can't do this work for me.

+11  A: 

Seriously? Why would anyone ever write real-world code like that? IMHO, the code, not the compiler is the "stupid" entity here. I for one am perfectly happy that compiler writers don't bother wasting their time trying to optimize something like that.

Edit/Clarification: I know the code in the question is meant as an example, but that just proves my point: you either have to be trying, or be fairly clueless to write supremely inefficient code like that. It's not the compiler's job to hold our hand so we don't write horrible code. It is our responsibility as the people that write the code to know enough about our tools to write efficiently and clearly.

Daniel Schaffer
+1 to readable code
Gerald Kaszuba
I want to write the readable code, and not code that makes the compiler happy. These are just stupid simple examples to show that the compiler is dumb, nothing I would use in a real program.
"but that just proves my point": How does it prove your point? The question is a general one about why there are still things that are obvious to the human eye and not to the compiler, and you pick fault with the (obviously exaggerated for effect) examples instead.
The point is that if the examples weren't exaggerated for effect, they wouldn't exemplify the point of the question, which I went on to explain - you either write horrible code on purpose because you're trying to, or you do it by accident because you make a mistake or don't know what you're doing.
Daniel Schaffer
Not that I agree, but he is saying that it proves his point because these are contrived examples and not snippets of "real" code. If someone could provide real, production code that was similarly (mis-)treated by the compiler, that would be a counterexample that undermines his argument.
Adam Bellaire
^^ Neither of those is any fault of the compiler, and even if it could, I wouldn't want my compiler "fixing" things like that.
Daniel Schaffer
@Adam, thank you.
Daniel Schaffer
Understood. :) Of course these are not real-world examples, but simply pointing that out is not an actual answer. I also don't understand 'I wouldn't want my compiler "fixing" things like that' -- why not? Given that good compilers *do* fix things like that, do you now want to move to inferior ones?
@ShreevatsaR: See Pax's answer, he/she said it much better than I did.
Daniel Schaffer

It forces you (the programmer) to think about what you're writing. Forcing compilers to do your work for you doesn't help anyone: it makes the compilers much more complex (and slower!), and it makes you stupider and less attentive to your code.

Ben Alpert
Im not sure this is a good way to look at it, this way we might still be stuck with writing assembler code.
+13  A: 

On your first example, it's an optimization that only works if the value is zero. The extra if statement in the compiler needed to look for this one rarely-seen clause may just not be worth it (since it'll have to check for this on every single variable). Furthermore, what about this:

int x = 1;
int y = 1;
int z = x - y;
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
    z += z + z + z + z + z;

This is still obviously the same thing, but now there's an extra case we have to code for in the compiler. There's just an infinite amount of ways that it can end up being zero that aren't worth coding in for, and I guess you could say that if you're going to have one of them you'd might as well have them all.

Some optimizations do take care of the second example you have posted, but I think I've seen it more in functional languages and not so much Java. The big thing that makes it hard in newer languages is monkey-patching. Now += can have a side-effect that means if we optimize it out, it's potentially wrong (e.g. adding functionality to += that prints out the current value will mean a different program altogether).

But it comes down to the same thing all over again: there's just too many cases you'd have to look for to make sure no side effects are being performed that will potentially alter the final program's state.

It's just easier to take an extra moment and make sure what you're writing is what you really want the computer to do. :)

Chris Bunch
Compilers work in several passes, and the constant folding + constant propagation pass(es) will reduce your example to the one posted by the OP, so the code required to perform this optimization is the same for both cases.
+42  A: 

Speaking from a C/C++ point of view:

Your first example will be optimized by most compilers. If the java-compiler from Sun really executes this loop it's the compilers fault, but take my word that any post 1990 C, C++ or Fortran-compiler completely eliminates such a loop.

Your second example can't be optimized in most languages because memory allocation happens as a side-effect of concatenating the strings together. If a compiler would optimize the code the pattern of memory allocation would change, and this could lead to effects that the programmer tries to avoid. Memory fragmentation and related problems are issues that embedded programmers still face every day.

Overall I'm satisfied with the optimizations compilers can do these days.

Nils Pipenbrinck
Memory allocation/management happens to be one of those things Java programmers take for granted.
Eduardo León
I think most C++ compilers will eliminate the addition, but not the loop, nor the assignment.Assigning to the stack causes a side effect. Eliminating the side effect could potentially change the meaning of the program, and would therefore not be correct.
Scott Wisniewski
Scott, I tried it with the latest GCC. It removes the entire calculation including the loop. All that remains is a printf call and the surrounding code that pushes a zero onto the stack.
Nils Pipenbrinck
Ok. I had tried both GCC and VC++ with -O3 and /Ox, and they left the loop there.I don't think I was running the latest version of GCC though.I would think that you could make an argument against that being a safe optimization in C++.For Java, or Haskell, it would be safe, but not for C.
Scott Wisniewski
I would imagine that in order to get it to work right in C++ they must be doing some sort of aliasing analysis, which is pretty advanced.
Scott Wisniewski
Scott, You're right. I just checked VS2008 and it does not do the optimization. My GCC (4.3.1) however does the trick. The C-standard does btw not mention the stack at all, so it's valid to do any optimizations that change stack-allocation patterns.
Nils Pipenbrinck
+1 for the insightful comments
Excellently useful comments :)
Binary Worrier
@Scott: No fancy aliasing analysis is needed. Nothing can alias a local variable whose address is never taken.
Jason Orendorff
"Your second example can't be optimized in most languages because..." To me, this paragraph suggests a misunderstanding of what side effects are.
Jason Orendorff
+6  A: 

I think you are underestimating how much work it is to make sure that one piece of code doesn't affect another piece of code. With just a small change to your examples x, i, and s could all point to the same memory. Once one of the variables is a pointer, it is much harder to tell what code might have side effects depending on is point to what.

Also, I think people who program compliers would rather spend time making optimizations that aren't as easy for humans to do.

Upvoted for the second part. It is much more fulfilling to make well-written code go faster by hitting some machine idiom than to optimize away badly written code.
+3  A: 

Because we're just not there yet. You could just as easily have asked, "why do I still need to write programs... why can't I just feed in the requirements document and have the computer write the application for me?"

Compiler writers spend time on the little things, because those are the types of things that application programmers tend to miss.

Also, they cannot assume too much (maybe your loop was some kind of ghetto time delay or something)?

Giovanni Galbo

Did you compile to release code? I think a good compiler detects in your second example that the string is never used an removes the entire loop.

yes, I compiled with -server
In C# release mode it isnt optimized out. In C++ it is (code in my post)
+2  A: 

It's an eternal arms race between compiler writers and programmers.

Non-contrived examples work great -- most compilers do indeed optimize away the obviously useless code.

Contrived examines will always stump the compiler. Proof, if any was needed, that any programmer is smarter than any program.

In the future, you'll need more contrived examples than the one's you've posted here.

+3  A: 

Compilers in general are very smart.

What you must consider is that they must account for every possibly exception or situation where optimizing or re-factoring code could cause unwanted side-effects.

Things like, threaded programs, pointer aliasing, dynamically linked code and side effects (system calls/memory alloc) etc. make formally prooving refactoring very difficult.

Even though your example is simple, there still may be difficult situations to consider.

As for your StringBuilder argument, that is NOT a compilers job to choose which data structures to use for you.

If you want more powerful optimisations move to a more strongly typed language like fortran or haskell, where the compilers are given much more information to work with.

Most courses teaching compilers/optimisation (even acedemically) give a sense of appreciation about how making gerneral formally prooven optimisatons rather than hacking specific cases is a very difficult problem.


Personally, I want the compiler to be very stupid. Maybe there's a reason I put a stupid, useless do nothing/waste CPU loop in my program. I want the compiler to do exactly what I tell it to, and not try to be smart about things. If I write an empty loop that goes for 1 billion iterations, the computer should perform those operations. Otherwise programming would be like black magic. As others have mentioned, C/C++ compilers will optimize away the first example. I think that is a major problem. The compiler should compile exactly the code fed to it, no matter how "useless" it thinks that code might be.

Compilers do optimzations similar to this. The only reason I could see you actually relying on those iterations is to get some sort of time/pause/sleep effect, for which you shouldn't rely on this technique, there are built-in methods to force a thread to sleep and so forth.
You can always disable optimizations in any compiler if you are not happy about it. But seriously, if your program behaves differently with optimizations enabled, this is usually a sign that there is some serious problem in your code. Race conditions, for example.
You appear to be claiming that you want all software to be significantly slower than it is.
Jason Orendorff
I'm saying I want it to be up to the programmer how efficient their program is. It's alright for the compiler to make optimizations provided the computer is performing all the optimizations the person is telling it to perform. Maybe the programmer is trying to spin the processor for some reason, and doesn't want the compiler to just skip a useless loop.
+92  A: 

In my opinion, I don't believe it's the job of the compiler to fix what is, honestly, bad coding. You have, quite explicitly, told the compiler you want that first loop executed. It's the same as:

x = 0
sleep 6 // Let's assume this is defined somewhere.
print x

I wouldn't want the compiler removing my sleep statement just because it did nothing. You may argue that the sleep statement is an explicit request for a delay whereas your example is not. But then you will be allowing the compiler to make very high-level decisions about what your code should do, and I believe that to be a bad thing.

Code, and the compiler that processes it, are tools and you need to be a tool-smith if you want to use them effectively. How many 12" chainsaws will refuse to try cut down a 30" tree? How many drills will automatically switch to hammer mode if they detect a concrete wall?

None, I suspect, and this is because the cost of designing this into the product would be horrendous for a start. But, more importantly, you shouldn't be using drills or chainsaws if you don't know what you're doing. For example: if you don't know what kickback is (a very easy way for a newbie to take off their arm), stay away from chainsaws until you do.

I'm all for allowing compilers to suggest improvements but I'd rather maintain the control myself. It should not be up to the compiler to decide unilaterally that a loop is unnecessary.

For example, I've done timing loops in embedded systems where the clock speed of the CPU is known exactly but no reliable timing device is available. In that case, you can calculate precisely how long a given loop will take and use that to control how often things happen. That wouldn't work if the compiler (or assembler in that case) decided my loop was useless and optimized it out of existence.

Having said that, let me leave you with an old story (a possible urban myth) of a VAX FORTRAN compiler that was undergoing a benchmark for performance and it was found that it was many orders of magnitude faster than its nearest competitor.

It turns out the compiler noticed that the result of the benchmark loops weren't being used anywhere else and optimized the loops into oblivion.

Put much better than my answer, +1 :)
Daniel Schaffer
A 14" chainsaw is capable of cutting down a 22" tree, you just have to cut from both sides ;-)
-1 one for not distinguishing between standardised language definition and implementation defined run-time behaviour (== performance). Compilers are mostly allowed to emit whatever code they want to, as long as the _meaning_ doesn't change.
David Schmitt
Most compilers optimize loops away that are not used. I believe the compilers try to be as smart as possible, but they simply can't, for technical reasons.
@DavidS, having worked on a lot of embedded systems where timing loops are important, I'd prefer the compiler to suggest I can remove them for greater performance but not have that decision taken out of my hands. Ditto for "useless" memory accesses which may actually be memory-mapped I/O.
@DavidS (cont), but these are situations where _meaning_ includes the speed of execution. That stuff is probably best done by using a lower optimization level, or declaring variables as volatile.
If you don't want optimisation, don't use optimisation flags. You clearly don't want optimisation if you are relying on loops that do nothing but are there to take up time. The optimisation compiler is *meant* to speed things up while keeping the functional aspects of the code the same. This is also what 99% of people want. For some reason you want compilers to be designed for your 1% use-case.
Max Howell
Sleep does something though...
Austin Kelley Way
So does a big for-loop unless it's optimized away :-)
+13  A: 

The HotSpot JIT compiler will only optimize code that has been running for some time. By the time your code is hot, the loop has already been started and the JIT compiler has to wait until the next time the method is entered to look for ways to optimize away the loop. If you call the method several times, you might see better performance.

This is covered in the HotSpot FAQ, under the question "I write a simple loop to time a simple operation and it's slow. What am I doing wrong?".

+1 for reference to the Java HotSpot Technology. Those kind of optimizations are not the compilers job, but rather of the (virtual) machine, since only that knows the context and can verify the outcome.
I think they have on the fly stack replacement now, so you don't have to wait to the second call to get the JIT compile code running...

I have to agree, that set to maximum optimisation (when compile time is considered secondary to the resulting performance), the compiler should well be clever enough to optimise away such cases. And as pointed out earlier, modern C++ compilers do. So I guess the Java "compiler" still needs a few more development iterations to catch up.

Max Howell

I think that optimization issue is more characteristic of the particular compiler you're using than compilers in general.

But what kind of computer are you running that can execute any 100-billion-iteration loop in 6 seconds?

The secret is: 100 * 1000 * 1000 * 1000 = 1215752192 in Java with 32 bit ints ...
mjy you get a cookie.
so much for Java compiler quality - IMO it should fail to compile.

Actually, Java should use string builder in your second example.

The basic problem with trying to optimize these examples away is that doing so would require theorem proving. Which means that the compiler would need to construct a mathematical proof of what you're code will actually do. And that's no small task at all. In fact, being able to prove that all code really does have an effect is equivalent to the halting problem.

Sure, you can come up with trivial examples, but the number of trivial examples is unlimited. You could always think of something else, so there is no way to catch them all.

Of course, it is possible for some code to be proven not to have any effect, as in your examples. What you would want to do is have the compiler optimize away every problem that can be proven unused in P time.

But anyway, that's a ton of work and it doesn't get you all that much. People spend a lot of time trying to figure out ways to prevent programs from having bugs in them, and type systems like those in Java and Scala are attempts to prevent bugs, but right now no one is using type systems to make statements about execution time, as far as I know.

You might want to look into Haskel, which I think has the most advanced theory proving stuff, although I'm not sure on that. I don't know it myself.

Chad Okere
+20  A: 

Compilers are designed to be predictable. This may make them look stupid from time to time, but that's OK. The compiler writer's goals are

  • You should be able to look at your code and make reasonable predictions about its performance.

  • Small changes in the code should not result in dramatic differences in performance.

  • If a small change looks to the programmer like it should improve performance, it should at least not degrade performance (unless surprising things are happening in the hardware).

All these criteria militate against "magic" optimizations that apply only to corner cases.

Both of your examples have a variable updated in a loop but not used elsewhere. This case is actually quite difficult to pick up unless you are using some sort of framework that can combine dead-code elimination with other optimizations like copy propagation or constant propagation. To a simple dataflow optimizer the variable doesn't look dead. To understand why this problem is hard, see the paper by Lerner, Grove, and Chambers in POPL 2002, which uses this very example and explains why it is hard.

Norman Ramsey
The paper link is not working. can you please update the link.
@Algorist: updated
Norman Ramsey
+3  A: 

As others have addressed the first part of your question adequately, I'll try to tackle the second part, i.e. "automatically uses StringBuilder instead".

There are several good reasons for not doing what you're suggesting, but the biggest factor in practice is likely that the optimizer runs long after the actual source code has been digested & forgotten about. Optimizers generally operate either on the generated byte code (or assembly, three address code, machine code, etc.), or on the abstract syntax trees that result from parsing the code. Optimizers generally know nothing of the runtime libraries (or any libraries at all), and instead operate at the instruction level (that is, low level control flow and register allocation).

Second, as libraries evolve (esp. in Java) much faster than languages, keeping up with them and knowing what deprecates what and what other library component might be better suited to the task would be a herculean task. Also likely an impossible one, as this proposed optimizer would have to precisely understand both your intent and the intent of each available library component, and somehow find a mapping between them.

Finally, as others have said (I think), the compiler/optimizer writer can reasonably assume that the programmer writing the input code is not brain-dead. It would be a waste of time to devote significant effort to asinine special cases like these when other, more general optimizations abound. Also, as others have also mentioned, seemingly brain-dead code can have an actual purpose (a spin lock, busy wait prior to a system-level yield, etc.), and the compiler has to honor what the programmer asks for (if it's syntactically and semantically valid).

Drew Hall
+2  A: 

Absolute optimization is an undecidable problem, that means, there is no Turing machine (and, therefore, no computer program) that can yield the optimal version of a ANY given program.

Some simple optimizations can be (and, in fact, are) done, but, in the examples you gave...

  1. To detect that your first program always prints zero, the compiler would have to detect that x remains constant despite all the loop iterations. How can you explain (I know, it's not the best word, but I can't come up with another) that to a compiler?

  2. How can the compiler know that the StringBuilder is the right tool for the job without ANY reference to it?

In a real-world application, if efficiency is critical in a part of your application, it must be written in a low-level language like C.

Eduardo León
+1 For pt. 2--no reference is key!
Drew Hall

Mostly what you're complaining about is 'why are Java compiler so stupid', since most other language compilers are much smarter.

The reason for the stupidity of Java compilers is historical. First, the original java implementations were interpreter based, and performance was consisdered unimportant. Second, many of the original java benchmarks were problematic to optimize. I recall one benchmark that looked a lot like your second example. Unfortunately, if the compiler optimized the loop away, the benchmark would get a divide by zero exception when it tried to divide a baseline number by the elapsed time to compute its performance score. So when writing a optimizing java compiler, you had to be very careful NOT to optimize some things, as people would then claim your compiler was broken.

Chris Dodd
+2  A: 

It's almost considered bad practice to optimize things like this when compiling down to JVM bytecode. Sun's javac does have some basic optimizations, as does scalac, groovyc, etc. In short, anything that's truely language-specific can get optimized within the compiler. However, things like this which are obviously so contrived as to be language agnostic will slip through simply out of policy.

The reason for this is it allows HotSpot to have a much more consistent view of the bytecode and its patterns. If the compilers start mucking about with edge cases, that reduces the VM's ability to optimize the general case which may not be apparent at compile time. Steve Yeggie likes to harp on about this: optimization is often easier when performed at runtime by a clever virtual machine. He even goes so far as to claim that HotSpot strips out javac's optimizations. While I don't know if this is true, it wouldn't surprise me.

To summarize: compilers targeting VMs have a very different set of criteria, particularly in the area of optimization and when it is appropriate. Don't go blaming the compiler writers for leaving the work to the far-more-capable JVM. As pointed out several times on this thread, modern compilers targeting the native architecture (like the gcc family) are extremely clever, producing obscenely fast code through some very smart optimizations.

Daniel Spiewak
+2  A: 

I have never seen the point in dead code elimination in the first place. Why did the programmer write it?? If you're going to do something about dead code, declare it a compiler error! It almost certainly means the programmer made a mistake--and for the few cases it doesn't, a compiler directive to use a variable would be the proper answer. If I put dead code in a routine I want it executed--I'm probably planning to inspect the results in the debugger.

The case where the compiler could do some good is pulling out loop invariants. Sometimes clarity says to code the calculation in the loop and having the compiler pull such things out would be good.

Loren Pechtel
Compilers can generate dead code all by themselves. It's useful to be able to pull it out.
David Thornley
+8  A: 

Stupid Compiler thinks : "Surely he (The Clever) has a really good reason to write such a crap, then leave it untouched."

Roberto Russo
+1  A: 

This is an example of procedural code v. functional code.

You have detailed a procedure for the compiler to follow, so the optimisations are going to be based around the procedure detailed and will minimise any side effects or not optimise where it will not be doing what you expect. This makes it easier to debug.

If you put in a functional description of what you want eg. SQL then you are giving the compiler a wide range of options to optimise.

Perhaps some type of code analysis would be able to find this type of issue or profiling at run-time, but then you will want to change the source to something more sensible.


The meaning of your two examples is pointless, useless and only made to fool the compiler.

The compiler is not capable (and should not be) to see the meaning of a method, a loop or a program. That is where you get into the picture. You create a method for a certain functionality/meaning, no matter how stupid it is. It's the same case for simple problems or extreme complex programs.

In your case the compiler might optimize it, because it "thinks" it should be optimized in another way but why stay there?

Extreme other situation. We have a smart compiler compiling Windows. Tons of code to compile. But if it's smart, it boils it down to 3 lines of code...

"starting windows"
"enjoy freecell/solitaire"
"shutting down windows"

The rest of the code is obsolete, because it's never used, touched, accessed. Do we really want that?

Are you sure that gcc's compiler is not capable see the meaning of a method, a loop or a program? -- Jason's example suggests me that some pre-calculations is occurring at compile-time.
+8  A: 

Well, I can only speak of C++, because I'm a Java beginner totally. In C++, compilers are free to disregard any language requirements placed by the Standard, as long as the observable behavior is as-if the compiler actually emulated all the rules that are placed by the Standard. Observable behavior is defined as any reads and writes to volatile data and calls to library functions. Consider this:

extern int x; // defined elsewhere
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
    x += x + x + x + x + x;
return x;

The C++ compiler is allowed to optimize out that piece of code and just add the proper value to x that would result from that loop once, because the code behaves as-if the loop never happened, and no volatile data, nor library functions are involved that could cause side effects needed. Now consider volatile variables:

extern volatile int x; // defined elsewhere
for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
    x += x + x + x + x + x;
return x;

The compiler is not allowed to do the same optimization anymore, because it can't prove that side effects caused by writing to x could not affect the observable behavior of the program. After all, x could be set to a memory cell watched by some hardware device that would trigger at every write.

Speaking of Java, I have tested your loop, and it happens that the GNU Java Compiler (gcj) takes in inordinate amount of time to finish your loop (it simply didn't finish and I killed it). I enabled optimization flags (-O2) and it happened it printed out 0 immediately:

[js@HOST2 java]$ gcj --main=Optimize -O2
[js@HOST2 java]$ ./a.out
[js@HOST2 java]$

Maybe that observation could be helpful in this thread? Why does it happen to be so fast for gcj? Well, one reason surely is that gcj compiles into machine code, and so it has no possibility to optimize that code based on runtime behavior of the code. It takes all its strongness together and tries to optimize as much as it can at compile time. A virtual machine, however, can compile code Just in Time, as this output of java shows for this code:

class Optimize {
    private static int doIt() {
        int x = 0;
        for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
            x += x + x + x + x + x;
        return x;
    public static void main(String[] args) {
        for(int i=0;i<5;i++) {

Output for java -XX:+PrintCompilation Optimize:

1       java.lang.String::hashCode (60 bytes)
1%      Optimize::doIt @ 4 (30 bytes)
2       Optimize::doIt (30 bytes)

As we see, it JIT compiles the doIt function 2 times. Based on the observation of the first execution, it compiles it a second time. But it happens to have the same size as bytecode two times, suggesting the loop is still in place.

As another programmer shows, execution time for certain dead loops even is increased for some cases for subsequently compiled code. He reported a bug which can be read here, and is as of 24. October 2008.

Johannes Schaub - litb
If you use GCJ you use the same code-generation backend as GCC, so you get more or less the same optimizations. You can also use GCJ to compile to .class files btw. No need to compile to native code.
Nils Pipenbrinck
Nils, yeah i'm aware of that. i took the compile-to-machine code explicitly, to see how it behaves in that case (to see whether java possibly *forbids* those optimizations. apparently, it doesn't).
Johannes Schaub - litb
+56  A: 

Oh, I don't know. Sometimes compilers are pretty smart. Consider the following C program:

#include <stdio.h>  /* printf() */

int factorial(int n) {
   return n == 0 ? 1 : n * factorial(n - 1);

int main() {
   int n = 10;

   printf("factorial(%d) = %d\n", n, factorial(n));

   return 0;

On my version of GCC (4.3.2 on Debian testing), when compiled with no optimizaitons, or -O1, it generates code for factorial() like you'd expect, using a recursive call to compute the value. But on -O2, it does something interesting: It compiles down to a tight loop:

           testl   %edi, %edi
           movl    $1, %eax
           je  .L3
           .p2align 4,,10
           .p2align 3
           imull   %edi, %eax
           subl    $1, %edi
           jne .L4

Pretty impressive. The recursive call (not even tail-recursive) has been completely eliminated, so factorial now uses O(1) stack space instead of O(N). And although I have only very superficial knowledge of x86 assembly (actually AMD64 in this case, but I don't think any of the AMD64 extensions are being used above), I doubt that you could write a better version by hand. But what really blew my mind was the code that it generated on -O3. The implementation of factorial stayed the same. But main() changed:

           subq    $8, %rsp
           movl    $3628800, %edx
           movl    $10, %esi
           movl    $.LC0, %edi
           xorl    %eax, %eax
           call    printf
           xorl    %eax, %eax
           addq    $8, %rsp

See the movl $3628800, %edx line? gcc is pre-computing factorial(10) at compile-time. It doesn't even call factorial(). Incredible. My hat is off to the GCC development team.

Of course, all the usual disclaimers apply, this is just a toy example, premature optimization is the root of all evil, etc, etc, but it illustrates that compilers are often smarter than you think. If you think you can do a better job by hand, you're almost certainly wrong.

(Adapted from a posting on my blog.)

Jason Creighton
That's kick-ass! Do you know if gcc does more pre-computation on function calls with constant values? What does gcc do when choosing an n which produces a factorial which is greater than a 64 bit value?
I have no idea. It's all black magic to me. :-)Experimentation reveals that larger values will make my version of gcc not attempt pre-computation.If you're interested in playing around with this kind of thing, just compile a program with the -S flag. eg, if you do "gcc -O3 -S factorial.c", it will produce a file, factorial.s, which contains the generated assembly. There's probably some other flags to dump whatever gcc's intermediate representation is, although that's more specialized than straight assembly.
Jason Creighton
"movl $3628800" suggests me that gcc pre-calculates at least some functions at compile-time. --- When does gcc stop to pre-compute functions at compile-time? --- It is likely that gcc somehow estimates the size of the function's output, and this way, decides whether to pre-compute or not at compile-time.
@Jason: Which command did you use to get the machine codes? --- I run unsuccessfully % gcc -O2 code.c %
@Masi: you need to use the -S flag. so, eg, "gcc -O2 -S code.c" will produce a file named "code.s" containing the assembly.
Jason Creighton
I once almost fell off my chair as I tried a few variants of swapping two variable contents (temp variable, xor and arithmetic) and compared how they look in assembly. My main method used constants and with optimizations MSVC emitted simply the `printf` statements with the swapped values (along with the compiled functions but they weren't called anymore). Such kinds of precomputation are pretty nice, indeed.
+1 for "My hat is off to the GCC development team"
Martijn Courteaux
its a simple static optimization(because the variable you do pass don't change), no big deal. Compilers do only cover the cases they are designed for, nothing else, they will never think/abstract/understand the algorithm
Wow this is by far one of my favorite answers on SO! Thanks!
+1  A: 

Because compiler writers try add optimizations for things that matter (I hope) and that are measured in *Stone benchmarks (I fear).

There are zillions of other possible code fragments like yours, which do nothing and could be optimized with increasing effort on the compiler writer, but which are hardly ever encountered.

What I feel embarrassing is that even today most compilers generate code to check for the switchValue being greater than 255 for a dense or almost full switch on an unsigned character. That adds 2 instructions to most bytecode interpreter's inner loop.


Compilers are as smart as we make them. I don't know too many programmers who would bother writing a compiler that would check for constructs such as the ones you used. Most concentrate on more typical ways to improve performance.

It is possible that someday we will have software, including compilers, that can actually learn and grow. When that day comes most, maybe all, programmers will be out of job.

Jim C
i hope this will never happen, or it should happen if im no more alive :P

Compilers that can do strict-aliasing optimizations, will optimize first example out. See here.

Second example can't be optimized because the slowest part here is memory allocation/reallocation and operator+= is redefined into a function that does the memory stuff. Different implementations of strings use different allocation strategies.

I myself also would rather like to have malloc(100000) than thousand malloc(100) too when doing s += "s"; but right now that thing is out of scope of compilers and has to be optimized by people. This is what D language tries to solve by introducing pure functions.

As mentioned here in other answers, perl does second example in less than a second because it allocates more memory than requested just in case more memory will be needed later.


In release mode VS 2010 C++ this doesnt take any time to run. However debug mode is another story.

#include <stdio.h>
int main()
    int x = 0;
    for (int i = 0; i < 100 * 1000 * 1000 * 1000; ++i) {
        x += x + x + x + x + x;
    printf("%d", x);