views:

452

answers:

7

I have heard about code bloats in context of C++ templates. I know that is not the case with modern C++ compilers. But, I want to construct an example and convince myself.

Lets say we have a class

template< typename T, size_t N >
class Array {
  public:
    T * data();
  private:
    T elems_[ N ];
};

template< typename T, size_t N >
T * Array<T>::data() {
    return elems_;
}

Further, let's say types.h contains

typedef Array< int, 100 > MyArray;

x.cpp contains

MyArray ArrayX;

and y.cpp contains

MyArray ArrayY;

Now, how can I verify that the code space for MyArray::data() is same for both ArrayX and ArrayY?

What else I should know and verify from this (or other similar simple) examples? If there is any g++ specific tips, I am interested for that too.

PS: Regarding bloat, I am concerned even for the slightest of bloats, since I come from embedded context.


Addition: Does the situation change anyhow if the template classes are explicitly instantiated?

A: 

The generated code will be exactly the same, since the code in both files is exactly the same. You can disassemble the code to check it if you want.

iconiK
But will there be one copy or two?
BCS
+8  A: 

You're asking the wrong question - any "bloat" in your example has nothing to do with templates. (the answer to your question, btw, is to take the address of the member function in both modules and you'll see they're the same)

What you really want to ask is, for each template instantiation, does the resulting executable grow linearly? The answer is no, the linker/optimizer will do magic.

Compile an exe that creates one type:

Array< int, 100 > MyArray;

Note the resulting exe size. Now do it again:

Array< int, 100 > MyArray;
Array< int, 99 > MyArray;

And so on, for 30 or so different versions, charting the resulting exe sizes. If templates were as horrible as people think, the exe size would grow by a fixed amount for each unique template instantiation.

Terry Mahaffey
"take the address of the member function in both modules and you'll see they're the same" - this invokes Schroedingers compiler. By observing a thing, you may alter it. In particular taking the address of something restricts what a compiler can do with it. E.g. you can't put an int in a register if you need to have an int* to it.
MSalters
The came thing can be done (in theory) via the debug info, but don't ask me how.
BCS
@MSalters: kind of pointless, since C++ is defined around observable behavior. If you don't observe a C++ program, the compiler could just emit a single `nop`.
jalf
This answer is inaccurate. If your functions generate identical code regardless of N (eg. your data() function), then a smart compiler *may* prevent duplicate functions from being emitted. OTOH, if your function actually uses the template parameter (eg. "if(i < N) blah();") every compiler in the world will actually emit two copies, because it would harm efficiency not to do so! You can see this first-hand with my "nmsize" utility below, but beware of inlining in your analysis.
Josh Haberman
@Josh, it's not. It's just describing an experiment. With maximum optimizations (eg, comdat folding by the linker) growth will be non-linear and in fact flatten out. Your utility below doesn't take optimizations or the linker into account - and thus it's inaccurate :)
Terry Mahaffey
Your experiment cannot be conclusive unless it takes all factors into account. For example, if you are just declaring template instantiations that are not being used, the compiler will optimize them away, and your experiment does not reflect what would happen if the declarations were actually used. My utility can be run on executables in which case it is absolutely accurate about optimizations performed at any stage of the toolchain.
Josh Haberman
By the way, "comdat folding" is what I said a smart compiler (linker actually, as you mention) may do. But as I said before, if the body of the function actually depends on the template parameter, comdat folding cannot be applied because the functions are different. In this case the growth will indeed be linear.
Josh Haberman
@jalf: Application size is observable, in the sense that you can see it, and non-observable in ISO standard sense. The hoped-for optimization (function folding) produces the same ISO-standard observable application output, but causes a different application size.
MSalters
+5  A: 

In this specific case, you'll find that g++ will tend to inline the accessor if you have any sort of optimization on. That right there is some minor code bloat, though it's debatable if the overhead of the call would be any less.

However, one easy way to verify what's getting compiled is with the nm tool. If I compile your code with a simple main() to exercise ArrayX::data() and ArrayY::data() and then compile it with -O0 to turn off inlining, I can run nm -C to see the symbols in the executable:

% nm -C test
0804a040 B ArrayX
0804a1e0 B ArrayY
08049f08 d _DYNAMIC
08049ff4 d _GLOBAL_OFFSET_TABLE_
0804858c R _IO_stdin_used
         w _Jv_RegisterClasses
080484c4 W Array<int, 100u>::data()
08049ef8 d __CTOR_END__
08049ef4 d __CTOR_LIST__
08049f00 D __DTOR_END__
...

You'll see that the Array<int, 100u>::data() symbol only occurs once in the final executable even though the object file for each of the two translation units contains it's own copy. (The nm tool also works on object files. You can use it to check that x.o and y.o each has a copy of Array<int, 100u>::data().)

If nm doesn't provide enough detail, you might also have a look at the objdump tool. It's a lot like nm, but with debugging symbols turned on, it can even show you things like a disassembly of the output executable with intermixed source lines.

Boojum
+!: This is a nice way of looking at the problem. I verified that the final executable has *single* copy of Array<int, 100u>
ArunSaha
+3  A: 

Templates have nothing to do with that.

Consider this small program:

a.h:

class a {
    int foo() { return 42; }
};

b.cpp:

#include "a.h"

void b() {
  a my_a;
  my_a.foo();
}

c.cpp:

#include "a.h"

void c() {
  a my_a;
  my_a.foo();
}

There are no templates, but you have the exact same problem. The same function is defined in multiple translation units. And the rule is the same: only one definition is allowed to exist in the final program, otherwise the compiler wouldn't be able to determine which one to call, and otherwise two function pointers pointing to the same function may point to different addresses.

The "problem" with template code bloat is something different: it is if you create a lot of different instantiations of the same template. For example, using your class, this program will risk some code bloat:

Array< int, 100 > i100;
Array< int, 99 > i99;
Array< long, 100 > l100;
Array< long, 99> l99;

i100.Data();
i99.Data();
l100.Data();
l99.Data();

Strictly speaking, the compiler is required to create 4 different instantiations of the Data function, one for each set of template parameters. In practice, some (but not all) compilers try to merge them back together, as long as the generated code is identical. (In this case, the assembly generated for Array< int, 100 > and Array< long, 100 > would be identical on many platforms, and the function doesn't depend on the array size either, so the 99 and 100 variants should also produce identical code, so a clever compiler will merge the instantiations back together.

There is no magic to templates. They don't mysteriously "bloat" your code. They just give you a tool that lets you easily create a bajillion different types from the same template. If you actually use all of these types, it has to generate code for all of them. As always with C++, you pay for what you use. If you use both an Array<long, 100>, an Array<int, 100>, an Array<unsigned long, 100> and an Array<unsigned int, 100>, then you get four different classes, because four different classes was what you asked for. If you don't ask for four different classes, they don't cost you anything.

jalf
This analysis is not correct. Unlike your alternate example, a template instantiation generates non-inline methods that will end up in the object file.See: http://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html . The default option is "3. Do nothing. Pretend G++ does implement automatic instantiation management. Code written for the Borland model will work fine, but each translation unit will contain instances of each of the templates it uses. In a large program, this can lead to an unacceptable amount of code duplication."
Josh Haberman
@Josh: The translation unit will contain instances of the same code (and so will my non-template example), yes, but the final executable certainly won't. Your link even says this explicitly: "Somehow the compiler and linker have to make sure that each template instance occurs exactly once in the executable if it is needed, and not at all otherwise". If multiple instantiations are generated by the compiler (as in the "do nothing" model), the linker strips out all but one of them, just like it does with non-template symbols.
jalf
@jalf: you are correct, I was getting my templating issues mixed up. You can still end up with what are effectively redundant definitions in your executable if eg. Foo<1>::Bar() and Foo<2>::Bar() generate exactly the same code. Many compilers won't collapse these into a single copy even if the emitted code is identical. But you are correct in saying that this is a function of how many different instantiations you declare.
Josh Haberman
+1  A: 

One test would be to put a static variable in data(), and increment it on each call, and report it.

If MyArray::data() is occupying the same code space, then you should see it report 1 and then 2.

If not, you should just see 1.

I ran it, and got 1 then 2, indicating that it was running from the same set of code. To verify this was indeed true, I created another array with size parameter of 50, and it kicked out 1.

Full code (with a couple tweaks and fixes) is below:

Array.hpp:

#ifndef ARRAY_HPP
#define ARRAY_HPP
#include <cstdlib>
#include <iostream>

using std::size_t;

template< typename T, size_t N >
class Array {
  public:
    T * data();
  private:
    T elems_[ N ];
};

template< typename T, size_t N >
T * Array<T,N>::data() {
    static int i = 0;
    std::cout << ++i << std::endl;
    return elems_;
}

#endif

types.hpp:

#ifndef TYPES_HPP
#define TYPES_HPP

#include "Array.hpp"

typedef Array< int, 100 > MyArray;
typedef Array< int, 50 > MyArray2;

#endif

x.cpp:

#include "types.hpp"

void x()
{
    MyArray arrayX;
    arrayX.data();
}

y.cpp:

#include "types.hpp"

void y()
{
    MyArray arrayY;
    arrayY.data();
    MyArray2 arrayY2;
    arrayY2.data();
}

main.cpp:

void x();
void y();

int main()
{
    x();
    y();
    return 0;
}
JohnMcG
+1: This is an excellent experiment construction. I verified using the same lines and convinced myself.
ArunSaha
A: 

Here is a little utility script I have been using to get insight into just these issues. It shows you not only if a symbol is defined multiple times, but also how much code size each symbol is taking. I have found this extremely valuable for auditing code size issues.

For example, here is a sample invocation:

$ ~/nmsize src/upb_table.o 
 39.5%     488 upb::TableBase::DoInsert(upb::TableBase::Entry const&)
 57.9%     228 upb::TableBase::InsertBase(upb::TableBase::Entry const&)
 70.8%     159 upb::MurmurHash2(void const*, unsigned long, unsigned int)
 78.0%      89 upb::TableBase::GetEmptyBucket() const
 83.8%      72 vtable for upb::TableBase
 89.1%      65 upb::TableBase::TableBase(unsigned int)
 94.3%      65 upb::TableBase::TableBase(unsigned int)
 95.7%      17 typeinfo name for upb::TableBase
 97.0%      16 typeinfo for upb::TableBase
 98.0%      12 upb::TableBase::~TableBase()
 98.7%       9 upb::TableBase::Swap(upb::TableBase*)
 99.4%       8 upb::TableBase::~TableBase()
100.0%       8 upb::TableBase::~TableBase()
100.0%       0 
100.0%       0 __cxxabiv1::__class_type_info
100.0%       0 
100.0%    1236 TOTAL

In this case I have run it on a single .o file, but you can also run it on a .a file or on an executable. Here I can see that constructors and destructors were emitted twice or three times, which is a result of this bug.

Here is the script:

#!/usr/bin/env ruby

syms = []
total = 0
IO.popen("nm --demangle -S #{ARGV.join(' ')}").each_line { |line|
  addr, size, scope, name = line.split(' ', 4)
  next unless addr and size and scope and name
  name.chomp!
  addr = addr.to_i(16)
  size = size.to_i(16)
  total += size
  syms << [size, name]
}

syms.sort! { |a,b| b[0] <=> a[0] }

cumulative = 0.0
syms.each { |sym|
  size = sym[0]
  cumulative += size
  printf "%5.1f%%  %6s %s\n", cumulative / total * 100, size.to_s, sym[1]
}

printf "%5.1f%%  %6s %s\n", 100, total, "TOTAL"

If you run this on your own .a files or executable files, you should be able to convince yourself that you know exactly what is happening with your code size. I believe that recent versions of gcc may remove redundant or useless template instantiations at link time, so I recommend analyzing your actual executables.

Josh Haberman
A: 

A better illustration of code bloat with templates is using a template to generate code, not variables. The typical panic is due to the compiler generating code for each instance of the template (stencil). This is similar to code bloat due to inline functions and methods. However, modern compilers and linkers can perform magick to reduce code size, depending on the optimization settings.

For example:

template <typename Any_Type>
void Print_Hello(const Any_Type& v)
{
    std::cout << "Hello, your value is:\n"
              << v
              << "\n";
    return;
}

The above code is best thought of as a stencil. The compiler will generate the code depending on the type passed to Print_Hello. The bloat here is that very little of the code is actually dependent on the variable. (Which can be reduced, by factoring out const code & data.)

The fear is that the compiler will generate the code for each instantiation using the same variable type, thus building up repetitive code:

int main(void)
{
  int a = 5;
  int b = 6;
  Print_Hello(a); // Instantiation #1
  Print_Hello(b); // Instantiation #2
  return 0;
}

The fear could also be extended when the template (stencil) is instantiated in different translation units.

Modern compilers and linkers are smart. A smart compiler would recognize the template function call and convert into some unique mangled name. The compiler would then only use one instantiation for each call. Similar to function overloading.

Even if the compiler was sloppy and generated multiple instances of the function (for the same type), the linker would recognize the duplicates and only put one instance into the executable.

When used wrecklessly, a function or method template can add extra code. Examples are large functions that only differ by type in a few areas. They have a high ratio of non-typed code to type dependent code.

An implementation of the above example with less bloat:

void Print_Prompt(void)
{
  std::cout << "Hello, your value is:\n";
  return;
}

template <typename Any_Type>
void Better_Print_Hello(const Any_Type& v)
{
  Print_Prompt();
  std::cout << v << "\n";
  return;
}

The primary difference is that the code that is not dependent on the variable type has been factored out into a new function. This may not seem worthwhile for this small example, but it illustrates the concept. And the concept is to refactor the function into pieces that are dependent on the variable type and those that are not. The pieces that are dependent are converted into templated functions.

Thomas Matthews