views:

283

answers:

3

Hi, I ran into this question when i was answering another guys question. How do compilers optimize the code? Can keywords like const, ... help? Beside the fact with volatiles and inline functions and how to optimize the code all by your self!

+4  A: 

One very big thing you can do ( beyond what the compiler can do for you ) is to be aware of the cache. Since accessing the memory is really time expensive, the cache tries to help you by storing not only the data you accessed it but the nearby elements as well. This is why foo will run so much faster than bar:

array[ NUM_ROWS ][ NUM_COLS ];

foo() 
{
    int row, col;
    int sum = 0;

    // accesses the elements in the array continuously
    for ( row = 0; row < NUM_ROWS ; row++ ) 
    {
         for ( col = 0; col < NUM_COLS; col++ )
         {
              sum += array[ row ][ col ];
         }
    }
}

bar() 
{
    int row, col;
    int sum = 0;

    // skips from row to row ( big jumps that might miss the cache )
    for ( col = 0; col < NUM_COLS ; col++ ) 
    {
         for ( row = 0; row < NUM_ROWS; row++ )
         {
              sum += array[ row ][ col ];
         }
    }
}

Edit: Another thing to be aware of is repeated string concatenation. Done wrong, this can make code that otherwise seems to run in O( n ) actually be in O( n^2 ) - see an article on Joel on Software

Edit: s/disk/memory/

Alex Reece
Funny thing you should care about such low-level things and still write row++ instead of ++row :P
Armen Tsirunyan
What does your example code have to do with accessing the disk?
jayrdub
being aware of the cache will save you a lot more than ++row :P@jayrdub: that involves an explanation of how memory *actually* works in the machine. Basically, `array[ row ][ col ]` is a call to main memory which is originally stored on the hard disk. Because the hard disk moves so much slower than the CPU, computers will store information in a 'cache' where it is easier to access.
Alex Reece
I think you have a weird computer
jayrdub
@Armen Tsirunyan - The value of the expression `row++` is not used here. Your compiler will likely generate identical code for `++row` and `row++` in such a case case.
asveikau
@jayrdub you should look at some assembly when you get a chance.
Alex Reece
@Alex: Tanx Alex, this was the answer i gave to the guy who i answered his question. Other stuff like this are hidden in our code and we don't notice. I really appreciate experiences :D.
Green Code
@Alex: For most cases (i.e. those not involving array a large fraction of the size of main memory) the cache that is at issue when dealing with row-major versus column-major array access schemes is one or more levels of the *CPU* cache and has nothing to do with the *disk* cache.
dmckee
@Alex Reece - @jayrdub is right, this is memory, not disk. That said, memory and disk can be equivalent, eg. if this part of memory is not paged in and the kernel will have to go to disk to retrieve it.
asveikau
So you're saying accessing column-major vs row-major won't increase my L1 cache hit rate?
Alex Reece
@Alex Reece - I don't think anyone is saying that. The objection is that you say "disk" when you are referring to memory.
asveikau
@Alex Reece: One of the most common optimization is turning the loops inside out to increase cache hit. It is likely that the optimizer would transform the column-major version into a row-major if it detects that (given the size) this should increase the cache hit.
Matthieu M.
@Alex Reece: Wikipedia's article: http://en.wikipedia.org/wiki/Loop_interchange
Matthieu M.
@Matthieu Dang, thats nice. What compilers do that?
Alex Reece
@Alex: gcc (-floop-interchange) see http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html, and I think it's the one called loop-rotate in llvm: http://llvm.org/docs/Passes.html#loop-rotate
Matthieu M.
A: 

The rules of optimization:

  1. Don't do it
  2. Advanced Users Only: Don't do it yet

Edit: The quote (and other information, useful or not) can be found in the CodingHorror article: Hardware is cheap, programmers are expensive. It would be nice to find the 'origin' of this phrase/quote though.

Winston Ewert
The question is not about manual optimization, but what compilers may do and how they do it.
DerKuchen
@Winston: Hi, Thank you for your advice, but i guess i am on the edge of learning it completely so i don't wanna misunderstand it.
Green Code
@DerKuchen: well, it's about both of those.
Michael Petrotta
+7  A: 

Compilers are free to optimize code so long as they can guarantee the semantics of the code are not changed.

I would suggestion starting at the Compiler optimization wikipedia page as there are many different kinds of optimization that are performed at many different stages.

As you can see, modern compilers are very 'smart' at optimizing code (compiled C code is often faster than hand-written assembly unless the programmer really knows how to take advantage of all the specific processor instructions and quirks). As others have said, write for clarity first based on a good design.

pst
@pst: compilers can do that if they know what hardware their code is running in! sometimes should the GPU come in, they mess up. I recently wrote a code which used both cpu and gpu (cuda) and the bug was just that simple O2 optimization. When i turned it off, all things made sense.
Green Code
@Green Code: Compilers are software too, so of course they are sometimes buggy. But for mature compilers, the output is usually correct and blazing fast compared to whatever most programmers could write on their own.
delnan
@Green Code: there are various level of optimizations. Some are machine independent, some not. Machine dependent may not be bug-less, especially for young architecture, simply because they haven't been tested extensively yet. CUDA brings a new difficulty too: suddenly there are parts of the code that should be optimized for CPU and others for GPU. None of the C++ compilers that I know of have been meant to optimize for two different architectures at once.
Matthieu M.