views:

223

answers:

4

Just to get it out of the way...

Premature optimization is the root of all evil

Make use of OOP

etc.

I understand. Just looking for some advice regarding the speed of certain operations that I can store in my grey matter for future reference.

Say you have an Animation class. An animation can be looped (plays over and over) or not looped (plays once), it may have unique frame times or not, etc. Let's say there are 3 of these "either or" attributes. Note that any method of the Animation class will at most check for one of these (i.e. this isn't a case of a giant branch of if-elseif).

Here are some options.

1) Give it boolean members for the attributes given above, and use an if statement to check against them when playing the animation to perform the appropriate action.

  • Problem: Conditional checked every single time the animation is played.

2) Make a base animation class, and derive other animations classes such as LoopedAnimation and AnimationUniqueFrames, etc.

  • Problem: Vtable check upon every call to play the animation given that you have something like a vector<Animation>. Also, making a separate class for all of the possible combinations seems code bloaty.

3) Use template specialization, and specialize those functions that depend on those attributes. Like template<bool looped, bool uniqueFrameTimes> class Animation.

  • Problem: The problem with this is that you couldn't just have a vector<Animation> for something's animations. Could also be bloaty.

I'm wondering what kind of speed each of these options offer? I'm particularly interested in the 1st and 2nd option because the 3rd doesn't allow one to iterate through a general container of Animations.

In short, what is faster - a vtable fetch or a conditional?

+2  A: 

(1) Not that the size of the generated assembly matters anymore these days, but this is what it generates (approximately, assuming MSVC on x86):

mov eax, [ecx+12]   ; 'this' pointer stored in ecx, eax is scratch
cmp eax, 0          ; test for 0 
jz  .somewhereElse  ; jump if the bool isn't set

The optimizing compiler will intersperse other instructions there, making it more pipeline-friendly. The contents of your class will most likely be in your cache anyway, and if it's not, it will be needed a few cycles later anyway. So, in retrospect, that's maybe a few cycles, and for something that will be called at most a few times per frame, that's nothing.

(2) This is approximately the assembly that will be generated every time your play() method is called:

mov  eax, [ebp+4]    ; pointer to your Animation* somewhere on the stack, eax is scratch
mov  eax, [eax+12]   ; dereference the vtable
call eax             ; call it

Then, you'll have some duplicate code or another function call inside your specialized play() function, since there'll definetely be some common stuff, so that incurs some overhead (in code size and/or execution speed). So, this is definetely slower.

Also, this makes it alot harder to load generic animations. Your graphics department won't be happy.

(3) To use this effectively, you'll end up making a base class for your templated version anyway, with virtual functions (in that case, see (2)), OR you'll do it manually by checking types in places where you call your animation thing, in which case also see (2).

This also makes it MUCH harder to load generic animations. Your graphics department will be even less happy.

(4) What you need to worry about is not some microoptimization for tiny things done at most a few times a frame. From reading your post, i actually identified another problem that's commonly overlooked. You're mentioning std::vector<Animation>. Nothing against the STL, but that's bad voodoo. A single memory allocation will cost you more cycles than all the boolean checks in your play() or update() methods for probably the entire time your application is running. Putting Animations in and out of std::vectors (especially if you're putting in instances and not pointers (smart or dumb) to instances) will cost you way more.

You need to look at different places to optimize. This is such a ridiculous microoptimization that will bring you no benefit except make it harder to generalize and make your graphics department happy. What will matter, however, is worrying about memory allocation, and THEN, when you're done programming that part, starting a profiler and looking where the hot spots are.

If keeping your animations is actually becoming a bottleneck, the std::vector (nice as it is) is where you might want to look. Have you looked at, say, an intrusive linked list? That will actually be more benefit than worrying about this.

arke
Vectors are fine as long as you reserve() ahead of time, in my experience.
leander
That's true, but that's assuming you know beforehand how many things you put in there. You rarely need random access to things like this, though, so the intrusive version will still be (somewhat) better in that you need not allocate the buffer for those pointers - the objects that are allocated anyway already have that.
arke
Oh, and a +1 for bothering to go out and disassemble it =) I love seeing what the compiler does, it's almost always enlightening.
leander
Well, that was actually typed from memory (I've been doing alot of Final-build debugging lately), but thanks anyway :). Ditto on the compilers, especially for CPUs like the PPC with the optimizer turned on.
arke
+1  A: 

Vtable is very very fast. So are simple conditionals. They translate to single digits of CPU instructions. Worrying about this kind of performance gets you in the murky waters of compiler optimisations, where you don't at all understand what the compiler is doing. Chances are, very subtle changes in your program can trump the minute differences between an if statement and a vtable.

I did a little test a while ago testing differences between RTTI multiple dispatch and vtable. In release mode a dispatch between three objects (two vtable calls) done over two million iterations take 62 milliseconds. That is way way not even worth worrying about.

Igor Zevaka
A: 

Who says #3 makes it impossible to have a generic container of animations? There are several approaches one can use. They do all boil down to eventually making a polymorphic call but the options are there. Consider this:

std::vector<boost::any> generic_container;
function(generic_container[0]);

void function(boost::any & a)
{
  my_operation::execute(a.type().name(), a);
}

my_operation just needs to have a way of registering and filtering operations by type name. It searches for a functor that operates on whatever a represents, and uses it. The functor then any_casts to the appropriate time and does the type specific operation.

Or use a visitor framework. The above is sort of a variation of that but at too generic a level to really qualify.

And there are more possible methods. Instead of storing animations you could store a type that hides the specifics and executes the correct view options when activated. One virtual is called but it is specific to switching out concrete types that do more complex operations on each other.

There is no general answer to your question in other words. Depending on what you need you could reach all kinds of levels of complexity to make almost your entire program compile time polymorphic as opposed to run-time.

Noah Roberts
+1  A: 

(Edited for brevity.)

The compiler, CPU, and OS all can change the answer, here:

  • CPU: instruction/data cache size, architecture, and behavior, especially any intelligent prefetch
  • CPU: branch prediction and speculative execution behavior
  • CPU: the penalty for a mispredicted branch
  • compiler and CPU: the availability and relative cost of conditionally-executed instructions (helps with branch cases that only cover a few instructions)
  • compiler or linker: optimizations that may transform your code and remove branches

In short, as Blindy said in the comments: test it. =)

If you're writing for a modern desktop OS or OSes, enlist the help of a profiling tool (valgrind, shark, codeanalyst, vtune, etc) -- it may give you details you never even knew you could look for, such as cache misses, branch mispredicts, etc.

Even if you don't find a great answer, you'll learn something from applying the tool. I often find looking at the disassembly quite instructive, too (see some of the other answers in this thread).

Some slightly more speculative notes:

  • vtable tends to result in a load (this+0), offset, second load, and then branch on the contents of the register. You can see this in some of the other answers. Most CPUs that I'm familiar with are miserable at predicting branches from registers.
  • the bool may be near other data you're using and as such may already be cached. The branch target is also likely to be fixed and therefore a lot more friendly for prediction and/or speculative execution.
  • on some processors (rarer these days), it costs more to load a bool than an int.
  • on an ARM processor I work with, we occasionally tuck the vtables in "tightly coupled memory" on the processor core. Decreases the indirect load time considerably -- it's as if the vtable is always in-cache or better.

As you mentioned, the usual rule applies: do what fits requirements and is flexible/maintainable/readable first, then optimize.

Further reading / other patterns to pursue:

Both the "Data Oriented Design" and the "Component-Based Entity" paradigms are useful to keep in your brain for games, multimedia engines, and other things where you have a greater-than-average demand for performance and still want to keep your code somewhat organized. YMMV, of course. =)

leander
+1 from me for mentioning data-oriented design.
arke