+5  A: 

Not too hard, just use the same technique inside your class. Any halfway decent optimizer will inline the trivial wrapper.

class ThingImpl;
class Thing
{
    ThingImpl *impl;
    static int calc(ThingImpl*);
public:
    Thing();
    int calc() { calc(impl); }
};
MSalters
That doesn't avoid the double-indirection: `this->impl->x`.
Marcelo Cantos
@Marcelo: It doesn't avoid it, but it should result in the generated assembly matching his `calc(OtherThing*)` routine, so the indirection is "hidden". There's no reason to clutter the class definition with a bunch of static wrapper functions, of course, since those can just be hidden in the implementation file.
Dennis Zickefoose
+1, `this->impl` indirection should basically disappear in your code. You don't really have to do this though, just enable link-time code generation.
avakar
I just confirmed that with MSalters' code, the compiler does get rid of the extra instruction.
Rob N
@avakar: How do I enable link-time code generation with GCC?
Rob N
@Rob N, `-flto`, but I think the feature didn't make it to a release version yet, so you'll have to wait a little.
avakar
@Marcelo: It often does avoid it, because unlike `OtherThing` you can allocate `Thing` on the stack, and the compiler can also put an entire Thing in a register.
MSalters
The compiler doesn't get rid of the extra indirection, it simply relocates it from `calc()` to `main()`. Putting it on the stack makes no difference; you still have to load the address pointed to by `impl`, and then load `x` from that. I've just confirmed with `g++ ... -S main.cc` that the `movq` is executed immediately before calling `calc`, which then executes the `movl`.
Marcelo Cantos
@Marcelo: Thanks, I didn't realize that. gcc does just move the indirection to main.
Rob N
+4  A: 

One instruction is rarely a thing to spend much time worrying over. Firstly, the compiler may cache the pImpl in a more complex use case, thus amortising the cost in a real-world scenario. Secondly, pipelined architectures make it almost impossible to predict the real cost in clock cycles. You'll get a much more realistic idea of the cost if you run these operations in a loop and time the difference.

Marcelo Cantos
I'm not sure what the best function is to use for profiling. But I #include'd <ctime> and called clock(), and there appears to be no difference in the run times.
Rob N
Some careful coding and very long loops might show up some discrepancy, but your result doesn't surprise me.
Marcelo Cantos
+1  A: 

There's the nasty way, which is to replace the pointer to ThingImpl with a big-enough array of unsigned chars and then placement/new reinterpret cast/explicitly destruct the ThingImpl object.

Or you could just pass the Thing around by value, since it should be no larger than the pointer to the ThingImpl, though may require a little more than that (reference counting of the ThingImpl would defeat the optimisation, so you need some way of flagging the 'owning' Thing, which might require extra space on some architectures).

Pete Kirkham
I had heard about this ugly, reinterpret_cast technique. I just tried it and it does remove the extra instruction. But the second thing you said -- passing by value -- doesn't seem to help. Calling t.calc() on a stack allocated Thing calls the same function with the extra instruction.
Rob N
You can use intrusive reference counting on ThingImpl, since it's private anyway. That probably won't defeat the optimization as you put the reference counter in a (public base class of) `ThingImpl`, not `Thing`
MSalters
@MSalters I suspect it would, as you would need to increment the count each time you passed the Thing to a function, and incrementing the count will probably take more operations than you save by not having an extra pointer indirection.
Pete Kirkham
@Rob N what happens if you compare the code for a pass-by value function parameter and a reference or pointer parameter? The value should just have the one indirection, pointers and references two.
Pete Kirkham
Pete, I assume you're right, but for now I was only interested in this case, where the function parameter was the implicit 'this' pointer.
Rob N
A: 

I disagree about your usage: you are not comparing the 2 same things.

#include "thing.hh"
#include <cstdio>

int main()
{
    Thing *t = new Thing;                // 1
    printf("calc: %d\n", t->calc());

    OtherThing *t2 = make_other();       // 2
    printf("calc: %d\n", calc(t2));
}
  1. You have in fact 2 calls to new here, one is explicit and the other is implicit (done by the constructor of Thing.
  2. You have 1 new here, implicit (inside 2)

You should allocate Thing on the stack, though it would not probably change the double dereferencing instruction... but could change its cost (remove a cache miss).

However the main point is that Thing manages its memory on its own, so you can't forget to delete the actual memory, while you definitely can with the C-style method.

I would argue that automatic memory handling is worth an extra memory instruction, specifically because as it's been said, the dereferenced value will probably be cached if you access it more than once, thus amounting to almost nothing.

Correctness is more important than performance.

Matthieu M.
You missed the line in my question that says: "Note: I'm only looking at calling member functions (in this case, calc). I'm not looking at object allocation."
Rob N
And I was challenging the necessity of your question. Pimpl brings RAII as well as insulation while C-style only brings insulation. Thus my conclusion that I am ready to let go of one cycle in my CPU at each function call (for a value that any good compiler would cache if called in a tight loop) in exchange for RAII (thus ensured correctness).
Matthieu M.
A: 

Let the compiler worry about it. It knows far more about what is actually faster or slower than we do. Especially on such a minute scale.

Having items in classes has far, far more benefits than just encapsulation. PIMPL's a great idea, if you've forgotten how to use the private keyword.

DeadMG