views:

479

answers:

4

Here's some code (full program follows later in the question):

template <typename T>
T fizzbuzz(T n) {
    T count(0);
    #if CONST
        const T div(3);
    #else
        T div(3);
    #endif
    for (T i(0); i <= n; ++i) {
        if (i % div == T(0)) count += i;
    }
    return count;
}

Now, if I call this template function with int, then I get a factor of 6 performance difference according to whether I define CONST or not:

$ gcc --version
gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)

$ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=0" &&
 time ./wrappedint
g++  -O3 -Wall -Werror -DWRAP=0 -DCONST=0   wrappedint.cpp   -o wrappedi
nt
484573652

real    0m2.543s
user    0m2.059s
sys     0m0.046s

$ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=1" &&
 time ./wrappedint
g++  -O3 -Wall -Werror -DWRAP=0 -DCONST=1   wrappedint.cpp   -o wrappedi
nt
484573652

real    0m0.655s
user    0m0.327s
sys     0m0.046s

Examining the disassembly shows that in the fast (const) case, the modulo has been turned into a multiplication and shift type thing, whereas in the slow (non-const) case it's using idivl.

Even worse, if I try to wrap my integer in a class, then the optimisation doesn't happen whether I use const or not. The code always uses idivl and runs slow:

#include <iostream>

struct WrappedInt {
    int v;
    explicit WrappedInt(const int &val) : v(val) {}
    bool operator<=(const WrappedInt &rhs) const { return v <= rhs.v; }
    bool operator==(const WrappedInt &rhs) const { return v == rhs.v; }
    WrappedInt &operator++() { ++v; return *this; }
    WrappedInt &operator+=(const WrappedInt &rhs) { v += rhs.v; return *this; }
    WrappedInt operator%(const WrappedInt &rhs) const 
        { return WrappedInt(v%rhs.v); }
};

std::ostream &operator<<(std::ostream &s, WrappedInt w) {
    return s << w.v;
}

template <typename T>
T fizzbuzz(T n) {
    T count(0);
    #if CONST
        const T div(3);
    #else
        T div(3);
    #endif
    for (T i(0); i <= n; ++i) {
        if (i % div == T(0)) count += i;
    }
    return count;
}

int main() {
    #if WRAP
        WrappedInt w(123456789);
        std::cout << fizzbuzz(w) << "\n";
    #else
        std::cout << fizzbuzz<int>(123456789) << "\n";
    #endif
}

My questions are:

1) Is there a simple principle of C++ itself, or gcc's optimisation, which explains why this happens, or is it just a case of "various heuristics run, this is the code you get"?

2) Is there any way to make the compiler realise that my locally-declared and never-referenced const WrappedInt can be treated as a compile-time const value? I want this thing to be a straight replacement for int in templates.

3) Is there a known way of wrapping an int such that the compiler can discard the wrapping when optimising? The goal is that WrappedInt will be a policy-based template. But if a "do-nothing" policy results in essentially arbitrary 6x speed penalties over int, I'm better off special-casing that situation and using int directly.

+1  A: 

Try combining const int v in your WrappedInt class with const T in your fizzbuzz function and see if the compiler can optimize that.

By declaring const int you've created a special case - a compile time constant. The compiler knows what the value is, and can optimize it more heavily than a value that could possibly change during the run of the program.

Mark Ransom
Unfortunately `WrappedInt` has two operators which modify `v`. But introducing a new class, `WrappedConstInt` with them removed, adding a suitable `operator%`, and making `div` a `const WrappedConstInt`, does not trigger the optimisation. It runs slow.
Steve Jessop
And I understand that "const int" is special. However, even when non-const, the compiler is pulling the value into a register, and the register is never modified in the loop. So I was rather hoping (a) that some DFA would allow it to see that a local int which is never modified and no references taken, is as good as const, and give it the treatment, and/or (b) that a const struct containing an int is just as const as a `const int`.
Steve Jessop
So I guess the question becomes, "is there a strong reason why neither (a) nor (b) happens, or is it just that gcc doesn't manage it?"
Steve Jessop
A: 

Is there a known way of wrapping an int such that the compiler can discard the wrapping when optimising?

Try passing WrappedInt by value. Then WrappedInts can be passed in registers. Pass-by-const-reference sometimes forces gcc to fall back to the stack for argument passing.

About the int vs const int slowdown, I can only speculate that GCC is trying to play it safe in the face of aliasing. Basically, if it cannot prove that div is not an alias for another, more accessible variable, it cannot turn it into a constant. If you declare it const, GCC assumes it's not aliased and performs the conversion into a constant. Apart from the idivl, you should also see a memory fetch, once, when entering the function, as opposed to immediate values being used for the const case.

In both the wrapped and non-const int cases, div is initialised with "`movl $3, %edi`" and never touches memory. And changing all the operators of WrappedInt (as well as the constructor) to pass by value doesn't enable the optimisation.
Steve Jessop
+6  A: 

I'm guessing its just the severely old GCC version you are running. The oldest compiler I have on my machine - gcc-4.1.2, performs the fast way with both the non-const and the wrap versions (and does so at only -O1).

Greg Rogers
That's good news, thanks.
Steve Jessop
Any particular reason you haven't upgraded in 4 years?
Greg Rogers
Cygwin defaults to 3.4. I assume they're still using it to build cygwin itself, or something. It has a gcc-4 package, though, so I'll try that. Not as bad as the 12 months+ that cygwin's version of gnupg contained a critical security flaw fixed upstream.
Steve Jessop
A: 

The difference in speed is caused by the compiler not knowing if "div" will change value. When it is non-const, it is treating it like a variable being passed in. It could be anything, and so the compiler will use an instruction that divides two variables - idivl. When you say that it's const, the compiler is free to treat it exactly as if you'd typed:

if (i  % 3 == 0)

I'm kind of surprised that it didn't use bitwise AND(&).

The WrappedInt isn't being optimized because, well, its not an int. Its a class.

Something that you could do is incorporate fizzbuzz into WrappedInt.

keraba
Only the modulo of powers of two can be expressed using AND.
Michael Foukarakis