Here's some code (full program follows later in the question):
template <typename T>
T fizzbuzz(T n) {
T count(0);
#if CONST
const T div(3);
#else
T div(3);
#endif
for (T i(0); i <= n; ++i) {
if (i % div == T(0)) count += i;
}
return count;
}
Now, if I call this template function with int
, then I get a factor of 6 performance difference according to whether I define CONST or not:
$ gcc --version
gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
$ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=0" &&
time ./wrappedint
g++ -O3 -Wall -Werror -DWRAP=0 -DCONST=0 wrappedint.cpp -o wrappedi
nt
484573652
real 0m2.543s
user 0m2.059s
sys 0m0.046s
$ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=1" &&
time ./wrappedint
g++ -O3 -Wall -Werror -DWRAP=0 -DCONST=1 wrappedint.cpp -o wrappedi
nt
484573652
real 0m0.655s
user 0m0.327s
sys 0m0.046s
Examining the disassembly shows that in the fast (const) case, the modulo has been turned into a multiplication and shift type thing, whereas in the slow (non-const) case it's using idivl
.
Even worse, if I try to wrap my integer in a class, then the optimisation doesn't happen whether I use const or not. The code always uses idivl
and runs slow:
#include <iostream>
struct WrappedInt {
int v;
explicit WrappedInt(const int &val) : v(val) {}
bool operator<=(const WrappedInt &rhs) const { return v <= rhs.v; }
bool operator==(const WrappedInt &rhs) const { return v == rhs.v; }
WrappedInt &operator++() { ++v; return *this; }
WrappedInt &operator+=(const WrappedInt &rhs) { v += rhs.v; return *this; }
WrappedInt operator%(const WrappedInt &rhs) const
{ return WrappedInt(v%rhs.v); }
};
std::ostream &operator<<(std::ostream &s, WrappedInt w) {
return s << w.v;
}
template <typename T>
T fizzbuzz(T n) {
T count(0);
#if CONST
const T div(3);
#else
T div(3);
#endif
for (T i(0); i <= n; ++i) {
if (i % div == T(0)) count += i;
}
return count;
}
int main() {
#if WRAP
WrappedInt w(123456789);
std::cout << fizzbuzz(w) << "\n";
#else
std::cout << fizzbuzz<int>(123456789) << "\n";
#endif
}
My questions are:
1) Is there a simple principle of C++ itself, or gcc's optimisation, which explains why this happens, or is it just a case of "various heuristics run, this is the code you get"?
2) Is there any way to make the compiler realise that my locally-declared and never-referenced const WrappedInt can be treated as a compile-time const value? I want this thing to be a straight replacement for int in templates.
3) Is there a known way of wrapping an int such that the compiler can discard the wrapping when optimising? The goal is that WrappedInt will be a policy-based template. But if a "do-nothing" policy results in essentially arbitrary 6x speed penalties over int, I'm better off special-casing that situation and using int directly.