views:

299

answers:

10

I am looking to make a hefty shift towards templates in one of my OpenGL projects, mainly for fun and the learning experience. I plan on watching the size of the executable carefully as I do this, to see just how much of the notorious bloat happens. Currently, the size of my Release build is around 580 KB when I favor speed and 440 KB when I favor size.

Yes, it's a tiny project, and in fact even if my executable bloats 10 x its size, it's still going to be 5 MB or so, which hardly seems large by today's standards... or is it? This brings me to my question. Is speed proportional to size, or are there leaps and plateaus at certain thresholds, thresholds which I should be aiming to stay below? (And if so, what are the thresholds specifically?)

+7  A: 

Speed is based on many factors. This is why it's good to have a sensible program design that follows good architecture principles.

But the actual size of your executable has little to do with its performance, if the application is designed properly. The performance improvements you get by profiling your application and fixing the slow parts are only going to affect a small (10% or less) part of your code.

At any given moment (unless the program is embarrassingly parallel, or the performance-critical sections of your code happen to be quite large), only a tiny bit of code is executing within the processor anyway.

This is especially true of L1 cache. In principle, a large program will execute more slowly than a small one, but in practice the performance-critical code should stay in the L1 cache if you keep your performance-critical sections small enough.

Remember, a tight, high-performance loop only has to load itself into the L1 cache once, the first time through a loop.

Robert Harvey
A: 

I have not noticed a big correlation between the size of my OpenGL projects and performance, however I have never really taken on a huge project either. Writing efficient code is what is more important. What are you trying to do? How important is the extra gain in performance for your application? Focus on writing good code and you should be fine. Try out the templates as a learning experience for sure, make regular commits so you can always go back.

Serge
A: 

In practice, overall execution typically depends on the algorithms you use.

In the case of executable size, you will find speed to have a relation to CPU instruction cache size and the length of a given cache line. When you break out of a cache level to the next lower cache, you will notice a larger latency. However, it is the compiler's job to optimize and arrange code so you will generally execute linearly (if possible).

You should not tune to specific CPUs unless you expect your software to only run on one machine. For maximum speed in the general case, good design, good algorithm selection, and good compiler tuning will do more than executable size.

Paul Nathan
A: 

Bloat is more related to how much your program does at the same time than how large the executable is.

A simple and direct example to this is: Say I have a program that returns the first 1 000 000 000 primes. I can compute that with a function, or I can just hard code that list as a string and print that string. The latter program is a lot larger, but takes up a lot less resources to produce that list.

'Bloated' just means that it takes away a lot of the resources from other programs because it forces too much processes and threads into memory. Typically your OS just distributes those processes, it also can mean that it 'computes all the time', therefore, a really small program that would take days to compute a very large list of primes is in effect a 'bloat' because it computes all the time.

Lajla
-1. Code bloat is about code size, not about CPU time or number of threads executing.
Billy ONeal
+9  A: 

On most modern processors, locality is going to be more important than size. If you can keep all the currently executing code and a good portion of the data in your L1 cache, you're going to see big wins. If you're jumping all around, you may force code or data out of the cache and then need it again shortly thereafter.

"Data Oriented Design" helps with both code and data locality, in my experience. You may be interested in the Pitfalls of Object Oriented Programming slides, which do a good job of showing how to tackle things in such a way that you get both good data and code locality.

(Incidentally, this whole cache size and locality thing is one of the reasons why "optimize for size" can outperform "optimize for speed" in some cases.)

leander
POOP is an interesting study- good reference!
dash-tom-bang
POOP made me giggle.
JulianR
ditto, +1 for the reference
Kyle
+1  A: 

The size of your executable doesn't really matter. It is the size of the "active" code, the code which is actually executed frequently by the application, which matters. That is a lot harder to quantify, unfortunately. For a simple approximation, you could profile your application, take the procedures which account for 90% of the execution time, and add up their code sizes.

Most modern processors have 64KB or 128KB instruction caches, so it helps to keep the active code below this size. The next threshold would be L2 size, which can be a few megabytes.

Keith Randall
+4  A: 

Executable "bloat" has less affect on performance than the following two factors (which are related, but not the same)

  1. Code working set size.
  2. Data working set size.

The amount of code that your program needs to run to perform a unit of work, eg render a frame, will affect instruction cache hit rate, and page table hit rate. However, If 90% of the work is done in one small function that fits completely in i-cache, then code size of the program as a whole will only account for the other 10%.

Likewise with data, if your program has to touch 100MB of data every frame, this is going to perform much worse than a program with a working set that fits with in L1, L2 or L3 cache.

So its not really how big the executable is, but how much "stuff" is being used at any one time.

Justicle
Also important to consider is the locality of these working sets; small working sets distributed far and wide can be slower than one gigantic chunk of "thing" that can be processed in one swoop.
dash-tom-bang
Good point, the topology of the working set matters to.
Justicle
+3  A: 

It is my humble opinion that, by and large, "template bloat" is the boogeyman of C++; that is, it is a story told to scare children, but I have never seen evidence that it really exists on any noticeable scale (beyond compiler-specific bloat, of course). People will claim it exists due to unique code being generated for any set of template parameters, but they often fail to mention that without templates, you would be doing code duplication anyway (either by hand or with macros).

That said, templates CAN get out of control in other ways; for example, metaprogramming techniques can balloon compile times. But I think the benefits really outweigh the costs.

+2  A: 

If you're on a Unix, you can run your program under valgrind's cachegrind tool to measure the effect of the executable's size and locality on your program runtime directly, rather than having to try to work backwards from just the runtime numbers. cachegrind also gives you a lot of information about data locality, too.

Invocation looks something like valgrind --tool=cachegrind ./your_program arguments.

There's also a nice Qt GUI for the valgrind suite called KCacheGrind.

Richard Barrell
+1  A: 

In my experience, code bloat and execution-time bloat go hand in hand, and it's all about how the software is designed, in particular, how the data structure is designed.

If one follows the approach that every mental concept becomes a class, if one follows the notification-style pattern where simply setting a property, or adding an item to a collection, can result in a hidden ripple effect of actions propogating throughout a large non-normalized data structure network in order to try to keep it consistent, then the result will be large source code and poor performance.

On the other hand, if one tries to minimize data structure and keep it normalized (as much as reasonably possible), if to a reasonable extent, inconsistency in the data structure can be tolerated and repaired on a loosely-coupled basis, and if code generation can be used so that the program is not processing information at run time that hardly ever changes and could have been handled prior to compiling, then source code will be smaller, easily developed, and efficient.

Here's a small example where a reasonably-designed program, through a series of steps, was reduced in size by a factor of four, and made faster by a factor of 40, by eliminating data structure and using code generation.

Mike Dunlavey