views:

160

answers:

7

Alternate question title would be: How to explicitly have the compiler generate code for the compiler-generated constructors in a specific translation unit?

The problem we face is that for one code path the resulting -- thoroughly measured -- performance is better (by about 5%) if the copy-ctor calls of one object are not inlined, that is if this constructor is implemented manually. (We noticed this because during code-cleanup the superfluous explicitly implemented copy ctor of this class (17 members) was removed.)

Edit: Note that we have checked the generated assembly code and have made sure that the inlining and code generation is happening as I describe for the two different code versions.

We face now the choice of just dropping the manual copy-ctor code back in (it does exactly the same as the compiler generated one) or finding any other means of not inlining the copy ctor of this class.

Is there any means (for Microsoft Visual C++) to explicitly instantiate the compiler generated class functions in a specific translation unit or will they always be inlined in each translation unit where they are used? (Comments for gcc or other compilers are also welcome to get a better picture of the situation.)


Since the first 2 answers show some misunderstanding: The compiler generated class functions are only generated by the compiler itself if they are neither declared nor defined by the user. Therefore no modifiers whatsoever can be applied to them, since these function do not exist in the sourcecode.

struct A {
  std::string member;
};

A has a default and copy ctor, a dtor and a copy operator. Neither of these function can be modified via some declspec because they do not exist in the code.

struct B {
  std::string member;
  B(B const& rhs);
};

B now has a user supplied copy ctor and the user has to implement it. The compiler will not generate code for it.


Some more background for the doubters :-) ...

This code is compiled using MS Visual C++, but it is linked for an embedded(-like) (realtime) system. Performance was measured by taking timings on this system and I therefore think the guys who took the timings will have some decent numbers.

The test was performed by comparing two code versions where the only difference was the inline vs. the not-inline copy ctor of this one class. Timings with the inlined code were worse by about 5%.


Further checking has revealed that I was mistaken in one point: The compiler will generate separate functions for complex copy constructors. It will do this on its own discretion and it also depends on the optimization settings. So in our case the compiler is doing the wrong thing in our specific circumstances. From the answers so far it does not appear we can tell the compiler otherwise. :-(

A: 

http://msdn.microsoft.com/en-us/library/kxybs02x.aspx

BarsMonster
How can you use that for a compiler generated function?
Gorpik
Ops, shame on me :-))
BarsMonster
A: 

__declspec(noinline).

The documentation says that it applies only to member functions, but in fact it works with free functions as well.

atzz
How can you use that for a compiler generated function?
Gorpik
@Gorpik - Oh well... Haste makes waste, and all that...
atzz
+4  A: 

$12.1/5- "An implicitly-declared default constructor is an inline public member of its class.".

So there is nothing much we can do. The implcit constructor has to be an inline. Any other behavior in this regards would probably be an extension

Having said that,

It is likely that your manual copy constructor (which you removed during code cleanup) was doing the right thing. As an example, if one of the members (out of 17) in your class is a pointer member, it is likely that the manual copy constructor took care of deep copy(and hence took a performance hit).

So, unless you carefully review your manual copy constructor, don't even think of removing it and relying on the (potentially buggy) implicit copy constructor (in your context)

Chubsdad
I agree with the first part of this answer, but I cannot see what leads you to the other conclusion. First, if the explicit cc was doing additional things, it is quite unlikely that it was faster than the implicit one. Second, OP wrote that the explicit cc was "superfluous", and I can see no reason to distrust that.
Gorpik
@Gorpik: "performance is better (by about 5%) if the copy-ctor calls of one object are not inlined, that is if this constructor is implemented manually."
Chubsdad
Thanks for the std quote! And as I already stated in the question, the explicit copy ctor was doing exactly the right thing and the implicitly declared one is also exactly doing the right thing and there are no pointer problems -- the performance diff is really and actually due to the (multiply) inlined copy-ctor vs. the copy-ctor in a separate function.
Martin
@Chubsdad: Yes, that's what I said. If the explicit (manually implemented, not inlined) cctor is doing additional work, it should not perform better.
Gorpik
@Martin: How did you conclude that the performance hit is due to multiple inlined cc?
Chubsdad
@Martin: as Chubsdad says, you haven't really given us any reason to think inlining has anything to do with the perf difference. Can you elaborate?
jalf
Well, inlining might hurt performance if they are on the border of L1 code cache...
BarsMonster
I think the OP is about multiple inline function in different translation units. But as far as I know, a really industrial strength linker will really optimize out all redundant inline copies across translation units. But not sure
Chubsdad
@Chubsdad: if you have multiple instantiations of a function, then it's not inlined. The entire point in inlining is to place the instantiation of the code at the call site, effectively copying the function. That's not redundant, and you can't optimize it out (at least without reverting the original inlining optimization)
jalf
@Chubsad, others - see edits for additional info on the performance measurements.
Martin
Note that I have since found out that the standard statement does not imply that the compiler has to actually generate inlined code. See further edits.
Martin
A: 

it's often best to isolate it to a few core types which you know are problematic. example a:

class t_std_string {
    std::string d_string;
public:
    /* ... */

    /* defined explicitly, and out of line -- you know what to do here */
    t_std_string();
    t_std_string(const std::string& other);
    t_std_string(const t_std_string& other);
    ~t_std_string();

    inline std::string& get() { return this->d_string; }
    inline const std::string& get() const { return this->d_string; }
    /* ... */
};

struct B {
    t_std_string member;
    /* 16 more */
    /* ... */
};

or you can take some of it for free. example b:

/* B.hpp */

struct B {
private:

    /* class types */
    struct t_data {
        std::string member;

        /* 16 more ... */
    public:
        /* declare + implement the ctor B needs */

        /* since it is otherwise inaccessible, it will only hurt build times to make default ctor/dtor implicit (or by implementing them in the header, of course), so define these explicitly in the cpp file */
        t_data();
        ~t_data();

        /* allow implicit copy ctor and assign -- this could hurt your build times, however. it depends on the complexity/visibility of the implementation of the data and the number of TUs in which this interface is visible. since only one object needs this... it's wasteful in large systems */
    };
private:

    /* class data */
    t_data d_data;
public:
    /* you'll often want the next 4 out of line
       -- it depends on how this is created/copied/destroyed in the wild
     */
    B();
    B(const B& other);
    ~B();
    B& operator=(const B&);
};

/* B.cpp */

/* assuming these have been implemented properly for t_data */
B::B() : d_data() {
}

B::B(const B& other) : d_data(other) {
}

B::~B() {
}

B& B::operator=(const B&) {
    /* assuming the default behaviour is correct...*/
    this->d_data = other.d_data;
    return *this;
}
/* continue to B::t_data definitions */
Justin
+3  A: 

I highly doubt inlining has anything to do with it. If the compiler inlines the compiler-generated copy ctor, why wouldn't it also inline the explicitly defined one? (It is also unusual that the compiler's optimization heuristics fail so badly as to make inlined code 5% slower)

Before jumping to conclusions,

  • check the generated assembly to verify that the two versions actually do the exact same thing (and in the same order, using the same assembly and so on, since otherwise that might be the source of your performance difference)
  • check that the compiler-generated one actually is being inlined, and the manually defined one is not.

If that is the case, could you update your question with this information?

There is no way in C++ to indicate if a compiler-generated function should or shouldn't be inlined. Not even vendor-specific extensions such as __declspec(noinline) will help you there, since you're explicitly handing over all responsibility for the function to the compiler. So the compiler chooses what to do with it, how to implement it and whether or not to inline it. You can't both say "please implement this function for me", and at the same time "please let me control how the function is implemented". If you want control over the function, you have to implement it. ;)

In C++0x, it may be possible (depending on how these vendor-specific extensions interact with functions declared as = default).

But again, I'm not convinced that inlining is the issue. Most likely, the two functions just result in different assembly code being generated.

jalf
+1. The compiler might well be unable to inline the explicitly written cctor if it is in a source file (as opposed to a header file), as I assume. But I agree with the main reasoning and the rest of the answer.
Gorpik
@Gorpik: it might, but that depends on compiler flags. MSVC can inline across translation units and even .libs with the right flags. And I'd assume they use fairly aggressive optimization if they're concerned about a 5% reduction in performance.
jalf
I have edited the question to include a note that we *have* actually checked the assembly. I assure you, the only difference between the two versions where performance was measured is this one inlined vs. not-inlined copy ctor. (And yes, the manual copy ctor is obvioulsy not inline as it is implemented in a cpp file and we do not use whole-prg-optimization.)
Martin
Interesting. In that case, I stand corrected. :) I guess you'll have to explicitly implement the function then (or try, as @GMan suggested in another comment, optimizing for size rather than speed)
jalf
A: 

You could use some sort of nested object. In this way the nested object's copy constructor can be left as the maintenance-free default, but you still have an explicitly created copy constructor that you can declare noinline.

class some_object_wrapper {
    original_object obj;
    __declspec(noinline) some_object_wrapper(const some_object_wrapper& ref) 
        : obj(ref) {}
    // Other function accesses and such here
};

If you're desperate, you could compile the class in question separately in a .lib and link to it. Changing it to a different translation unit will not stop VC++ from inlining it. Also, I have to question as to whether they're actually doing the same thing. Why did you implement a manual copy constructor if it does the same as the default copy constructor?

DeadMG
"Why did you implement a manual copy constructor if it does the same as the default copy constructor?" ... I had to laugh at that .... have you ever waded through legacy code? The amount of cargo cult programming you can find is simply astounding :-)
Martin