views:

582

answers:

6

I have a C++ application that can be simplified to something like this:

class AbstractWidget {
 public:
  virtual ~AbstractWidget() {}
  virtual void foo() {}
  virtual void bar() {}
  // (other virtual methods)
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> widgets;

 public:
  void addWidget(AbstractWidget* widget) {
    widgets.push_back(widget);
  }

  void fooAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      widgets[i]->foo();
    }
  }

  void barAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      widgets[i]->bar();
    }
  }

  // (other *All() methods)
};

My application is performance-critical. There are typically thousands of widgets in the collection. The classes derived from AbstractWidget (of which there are dozens) typically leave many of the virtual functions not overridden. The ones that are overridden typically have very fast implementations.

Given this, I feel I can optimize my system with some clever meta-programming. The goal is to leverage function inlining and to avoid virtual function calls, while keeping the code managable. I've looked into the Curiously Recurring Template Pattern (see here for description). This seems to almost do what I want, but not quite.

Is there any way to make the CRTP work for me here? Or, is there any other clever solution anyone can think of?

+1  A: 

The problem that you will have here is with WidgetCollection::widgets. A vector can only contain items of one type, and using the CRTP requires that each AbstractWidget have a different type, templatized by the desired derived type. That is, you're AbstractWidget would look something like this:

template< class Derived >
class AbstractWidget {
    ...
    void foo() {
        static_cast< Derived* >( this )->foo_impl();
    }        
    ...
}

Which means that each AbstractWidget with a different Derived type would constitute a different type AbstractWidget< Derived >. Storing these all in a single vector won't work. So it looks like, in this case, virtual functions are the way to go.

Naaff
+1  A: 

Not if you need a vector of them. The STL containers are completely homogeneous, which means that if you need to store a widgetA and a widgetB in the same container, they must be inherited from a common parent. And, if widgetA::bar() does something different than widgetB::bar(), you have to make the functions virtual.

Do all of the widgets need to be in the same container? You could do something like

vector<widgetA> widget_a_collection;
vector<widgetB> widget_b_collection;

And then the functions wouldn't need to be virtual.

rlbond
They don't *need* to be, but things get dangerous if they are not. Reason being, some of the *All() methods have order-synchronization requirements. For example, AbstractWidget has a string getName() method and a double getValue() method, and the WidgetCollection has a printAllNames() method and a printAllValues() methods, and the printed lines should align with each other.
I don't (yet) see the problem there. The order of iteration over the widgets can still be well defined (all the As, then all the Bs), so you'll get synched results provided you don't change the collection in the mean time. They just won't be listed in the order the widgets were added.
Steve Jessop
The danger is that if I'm careless, I might mistakenly have the A-for-loop before the B-for-loop in printAllValues(), but then the B-for-loop before the A-for-loop in printAllValues(). But if this is the best solution, then it's something I'm willing to live with.
Good point. You could use templates to avoid that - have a template function to do the loops in a particular order. It takes another template as a template parameter, then calls f<widget_a> on the As, f<widget_b> on the Bs, etc. Finally fooAll defines a template that calls non-virtual foo (but of course that's a different foo for each instantiation of the template), and instantiates the loop fucntion template with it. barAll does the same thing, except its template calls bar.
Steve Jessop
Note that foo and bar must have sufficiently similar signatures for this to work (same number of parameters, I think: the actual parameter types can be carried around as extra template parameters). If they don't, then I guess you still have to write multiple looping templates, but hopefully fewer than there are functions to call.
Steve Jessop
Oh yes, and I'd want to actually look at the object code to be sure everything really has been inlined, and that it hasn't introduced any extra unexpected overhead.
Steve Jessop
A: 

CRTP or compile-time polymorphism is for when you know all of your types at compile time. As long as you're using addWidget to collect a list of widgets at runtime and as long as fooAll and barAll then have to handle members of that homogenous list of widgets at runtime, you have to be able to handle different types at runtime. So for the problem you've presented, I think you're stuck using runtime polymorphism.

A standard answer, of course, is to verify that the performance of runtime polymorphism is a problem before you try to avoid it...

If you really need to avoid runtime polymorpism, then one of the following solutions may work.

Option 1: Use a compile-time collection of widgets

If your WidgetCollection's members are known at compile time, then you can very easily use templates.

template<typename F>
void WidgetCollection(F functor)
{
  functor(widgetA);
  functor(widgetB);
  functor(widgetC);
}

// Make Foo a functor that's specialized as needed, then...

void FooAll()
{
  WidgetCollection(Foo);
}

Option 2: Replace runtime polymorphism with free functions

class AbstractWidget {
 public:
  virtual AbstractWidget() {}
  // (other virtual methods)
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> defaultFooableWidgets;
  vector<AbstractWidget*> customFooableWidgets1;
  vector<AbstractWidget*> customFooableWidgets2;      

 public:
  void addWidget(AbstractWidget* widget) {
    // decide which FooableWidgets list to push widget onto
  }

  void fooAll() {
    for (unsigned int i = 0; i < defaultFooableWidgets.size(); i++) {
      defaultFoo(defaultFooableWidgets[i]);
    }
    for (unsigned int i = 0; i < customFooableWidgets1.size(); i++) {
      customFoo1(customFooableWidgets1[i]);
    }
    for (unsigned int i = 0; i < customFooableWidgets2.size(); i++) {
      customFoo2(customFooableWidgets2[i]);
    }
  }
};

Ugly, and really not OO. Templates could help with this by reducing the need to list every special case; try something like the following (completely untested), but you're back to no inlining in this case.

class AbstractWidget {
 public:
  virtual AbstractWidget() {}
};

class WidgetCollection {
 private:
  map<void(AbstractWidget*), vector<AbstractWidget*> > fooWidgets;

 public:
  template<typename T>
  void addWidget(T* widget) {
    fooWidgets[TemplateSpecializationFunctionGivingWhichFooToUse<widget>()].push_back(widget);
  }

  void fooAll() {
    for (map<void(AbstractWidget*), vector<AbstractWidget*> >::const_iterator i = fooWidgets.begin(); i != fooWidgets.end(); i++) {
      for (unsigned int j = 0; j < i->second.size(); j++) {
        (*i->first)(i->second[j]);
      }
    }
  }
};

Option 3: Eliminate OO

OO is useful because it helps manage complexity and because it helps maintain stability in the face of change. For the circumstances you seem to be describing - thousands of widgets, whose behavior generally doesn't change, and whose member methods are very simple - you may not have much complexity or change to manage. If that's the case, then you may not need OO.

This solution is the same as runtime polymorphism, except that it requires that you maintain a static list of "virtual" methods and known subclasses (which is not OO) and it lets you replace virtual function calls with a jump table to inlined functions.

class AbstractWidget {
 public:
  enum WidgetType { CONCRETE_1, CONCRETE_2 };
  WidgetType type;
};

class WidgetCollection {
 private:
  vector<AbstractWidget*> mWidgets;

 public:
  void addWidget(AbstractWidget* widget) {
    widgets.push_back(widget);
  }

  void fooAll() {
    for (unsigned int i = 0; i < widgets.size(); i++) {
      switch(widgets[i]->type) {
        // insert handling (such as calls to inline free functions) here
      }
    }
  }
};
Josh Kelley
Hi Josh,Unfortunately, the WidgetCollection's members are not known at compile time. I have a WidgetFactory that converts strings to AbstractWidget's, and the WidgetCollection is loaded from a text file.Your second suggestion is one that I started to implement, but there are reasons I dislike having multiple vectors (see my response to rlbond above).
Your response to rlbond gave me another idea.
Josh Kelley
I like your 3rd option, as it achieves inline-ability while keeping just a single vector of widgets. Thanks!
+5  A: 

Simulated dynamic binding (there are other uses of CRTP) is for when the base class thinks of itself as being polymorphic, but clients only actually care about one particular derived class. So for instance you might have classes representing an interface into some platform-specific functionality, and any given platform will only ever need one implementation. The point of the pattern is to templatize the base class, so that even though there are multiple derived classes, the base class knows at compile time which one is in use.

It doesn't help you when you genuinely need runtime polymorphism, such as for example when you have a container of AbstractWidget*, each element can be one of several derived classes, and you have to iterate over them. In CRTP (or any template code), base<derived1> and base<derived2> are unrelated classes. Hence so are derived1 and derived2. There's no dynamic polymorphism between them unless they have another common base class, but then you're back where you started with virtual calls.

You might get some speedup by replacing your vector with several vectors: one for each of the derived classes that you know about, and one generic one for when you add new derived classes later and don't update the container. Then addWidget does some (slow) typeid checking or a virtual call to the widget, to add the widget to the correct container, and maybe has some overloads for when the caller knows the runtime class. Be careful not to accidentally add a subclass of WidgetIKnowAbout to the WidgetIKnowAbout* vector. fooAll and barAll can loop over each container in turn making (fast) calls to non-virtual fooImpl and barImpl functions that will then be inlined. They then loop over the hopefully much smaller AbstractWidget* vector, calling the virtual foo or bar functions.

It's a bit messy and not pure-OO, but if almost all your widgets belong to classes that your container knows about, then you might see a performance increase.

Note that if most widgets belong to classes that your container cannot possibly know about (because they're in different libraries, for example), then you can't possibly have inlining (unless your dynamic linker can inline. Mine can't). You could drop the virtual call overhead by messing about with member function pointers, but the gain would almost certainly be negligible or even negative. Most of the overhead of a virtual call is in the call itself, not the virtual lookup, and calls through function pointers will not be inlined.

Look at it another way: if the code is to be inlined, that means the actual machine code has to be different for the different types. This means you need either multiple loops, or a loop with a switch in it, because the machine code clearly can't change in ROM on each pass through the loop, according to the type of some pointer pulled out of a collection.

Well, I guess maybe the object could contain some asm code that the loop copies into RAM, marks executable, and jumps into. But that's not a C++ member function. And it can't be done portably. And it probably wouldn't even be fast, what with the copying and the icache invalidation. Which is why virtual calls exist...

Steve Jessop
+4  A: 

The short answer is no.

The long answer (or still short campared to some other answers :-)

You are dynamically trying to figure out what function to execute at runtime (that is what virtual functions are). If you have a vector (whoses members can not be determined at compile time) then you can not work out how to inline the functions no matter what you try.

The only caviat to that is that if the vectors always contain the same elements (ie you could work out a compile time what is going to be executed at runtime). You could then re-work this but it would require somthing other than a vector to hold the elements (probably a structure with all the elements as members).

Also, do you really think that virtual dispatch is a bottleneck?
Personally I highly doubt it.

Martin York
If the application genuinely does nothing but iterate over these containers, calling functions whose implementations are totally trivial, then I would not be surprised if the bottleneck were the call overhead. It's not virtual dispatch as such, but the out-of-line call. The other candidate of course is cache misses from accessing thousands of disperate objects.
Steve Jessop
Martin's right. You're suggesting adding significant complexity for speculative performance gains. And thousands of widgets isn't very many for an optimization problem -- I would doubt the vtable is significant. Always, always profile it first when optimizing.
"You're suggesting adding significant complexity for speculative performance gains." Nowt wrong with that, as long as you measure performance before and after (on all the platforms you care about), and don't check in the change if it hasn't improved. Some of us have programmed for systems which don't even *have* a profiler, difficult though that might be to imagine ;-p
Steve Jessop
I added a comment to T.E.D.'s comment further down detailing why I believe a significant performance gain can be made here.I don't think the vtable is significant either, but I think the inlining is.
+1  A: 

Odds are that after you go through all that effort, you will see no performance difference.

This is absolutely the wrong way to optimize. You wouldn't fix a logic bug by changing random lines of code would you? No, that's silly. You don't "fix" code until you first find which lines are actually causing your problem. So why would you treat performance bugs differently?

You need to profile your application and find where the real bottlenecks are. Then speed up that code and rerun the profiler. Repeat until the performance bug (too slow execution) is gone.

T.E.D.
Hi T.E.D.,I did what you describe and concluded that a good metaprogramming solution should triple performance. It's a little hard to describe here but I'll try: my WidgetCollection used to be a much more complex class, with (effectively) a fixed set of AbstractWidget's, and with all the constituent data structures used by those AbstractWidgets as fields. My application took T seconds to run. I then made it more general by introducing AbstractWidgets. Afterwards, with the same set of AbstractWidgets, it took 3T seconds.
"You wouldn't fix a logic bug by changing random lines of code would you?" No, I'd change the lines I thought were most likely to be wrong, and see whether my guess was right. I know it's statistically unlikely, but it's fun to assume that the questioner has identified a genuine bottleneck, and that we're helping to free it. Otherwise every optimisation question on this site has exactly the same answer: "(1) find your bottleneck. (2) ... (3) Profit!".
Steve Jessop
To expound upon my comment, there are many confounding issues that complicate my assessment. Many of these widgets have interdependencies that demand an extra layer of caching that the old, less robust, WidgetCollection did not require. Profiling these other effects is not trivial - perhaps I should make efforts to analyze them better.