views:

379

answers:

5

Hi, I have heard from many people that usage of templates make the code slow. Is it really true. I'm currently building a library. There are places where if templates are not created, it would result in code management problem. As of now I can think two solutions to this problem:

  • use #defines

  • Use templates and define all possible types in the header file/library itself but do not allow end user to make template instances.

e.g. typedef Graph<int> GraphI32; etc.

Is there anyway, to restrict user from creating various template instances on their own.

Help on above queries would be highly regarded.

+6  A: 

Template make the Compilation slow. But most of the time it makes the program faster.

fabrizioM
+13  A: 

Since template instantiation happens at compile time, there is no run-time cost to using templates (as a matter of fact templates are sometimes used to perform certain computations at compile-time to make the program run faster). Heavy use of templates can however lead to long compile times.

sepp2k
Mostly right, but increased code size due to multiple instantiations of a template function can increase instruction cache misses, and slow down your program. There are cases where a non-template function could really service multiple types (e.g. int, short, and long) and do so with less executable code.
John Zwinck
@John: The compiler only instanciates actual classes that are used. This if non templated code were used the same situation would exist as the dev would have to generate all the classes manually. So this though techincally true is has nothing to do with templates it would be a design issue and occures weather templates are used or not.
Martin York
@Martin York: for example, I want to sort an array of int, an array of float, and an array of double. I could call three different instantiations of `std::sort`, or I could call `qsort` three times. Is it conceivable that the latter would produce smaller code? You can of course argue that programmers *might* proliferate multiple near-identical functions without the use of templates, but in practice they usually don't, if only because the typing would be horrendous.
Steve Jessop
Of course in that example, the smaller non-template code would probably be slower, because of the function pointer. But there seem to be people saying that templates cannot conceivably lead to larger object code. I don't think that corresponds with reality unless you compare "use of templates" with "a lot of copy-and-paste", when IMO you should be comparing "use of templates" with "a plausible non-template-using alternative design". Templates *do* in fact change how you design your code - this was a deliberate goal when they were invented, so I don't think it's "nothing to do with" them.
Steve Jessop
@steve: Here you are making a delibrate trade off. You are delibrately trading speed for size (as qsort is slower because of the required inbedded function call to do the comparison (due to its generic nature)). So in this situation you have actively decided to make the trade. If on the other hand you wanted an optimal way to do the sort without using templates you would need to write three versions of sort, how is this different from using a single templated version?
Martin York
You said, "if non templated code were used the same situation would exist as the dev would have to generate all the classes manually". My point is that this is a straw man. Typically if non-templated code were used, the dev would not generate all the classes manually, and the same situation would not exist. In fact, the dev would find some other way to write somewhat generic code, and it is this against which any sane analysis is done of whether using templates is a good idea or not. Of course it is a good idea, just not for the reason that if you didn't use it, you'd necessarily c'n'p.
Steve Jessop
For comparison: Q. "will I develop my application faster in Java than in assembler?". A. "No. If you wrote it in Java the same way as in asm, you'd just have a massive array of integers representing a memory map, and you would perform addition, multiplication and so on, using indices into this array instead of pointers in your assembler code. This isn't any easier. Of course you *might* decide to use variables, objects, and GC in Java. That would probably speed up development, but this is a deliberate trade-off and a design decision, and nothing to do with Java" Or A. "Probably"? ;-)
Steve Jessop
Similarly, non-template code will (I suspect, although possibly I'm wrong: that's another thread) come out somewhat smaller than template code, with less inlining, if you assign the same task to the same programmer. If it's a task that templates handle well, the non-template code would also presumably come out slower. The mistake (if the questioner's informant is even thinking along these lines) is to think that the larger template code will be slower. It's not a mistake AFAIK to think it will be larger. But we can both come up with plausible examples either way, of course.
Steve Jessop
+11  A: 

No they don't. When ever you find that you have "heard" something, and cannot name the source, you can almost certainly guarantee that what you have heard is wrong. In fact, templates tend to speed code up.

Rather than depending on hearing things, it's a good idea to read an authoritative book on the subject - I recommend C++ Templates - The Complete Guide.

anon
+4  A: 

They do make object code bigger, because C++ generates code for every type you use. But I don't believe this will slow execution speed. I have no numbers to suggest that it would.

It certainly does improve your lot in life during code development, reading and maintenance. I would not let coding urban myths discourage you from using a language feature that's clearly useful.

duffymo
Bigger object code can lead to decreased i-cache efficiency.
John Zwinck
Enough to eschew using them? I say no.
duffymo
+6  A: 

As others have already noted, templates don't have a direct run-time penalty -- i.e. all their tricks happen at compile time. Indirectly, however, they can slow things down under a few circumstances. In particular, each instantiation of a template (normally) produces code that's separate and unique from other instantiations. Under a few circumstances, this can lead to slow execution, by simply producing enough object code that it no longer fits in the cache well.

Templates have positive effects on speed far more often than negative.

Jerry Coffin
thanks mate. I just wanted to know corner cases.
Cache exaustion has nothing to do with templates. If the compiler had to generate a lot of code then you would have had to also generate the code manualy (without templates) the same situation exists. So this is a total red herring.
Martin York
"In particular, each instantiation of a template (normally) produces code that's separate and unique from other instantiations." - some evidence for this, please. Most implementations create multiple instances and then discard all but one at link-time, or do you think that every use of vector <int>, for example, generates duplicate code in the executable? If that were the case, no-one would ever use templates at all.
anon
The typical comparison is between C++-style templated containers, against Java-style type erasure. It's possible to implement C++ standard containers such that `vector<int>` and `vector<unsigned int>` call common binary code, but your compiler might not do that. Java ensures all code is shared between different kinds of vectors. Of course this is a trade-off, but it reduces the tendency Jerry highlights for C++ vectors over different types to generate separate code. If it all gets inlined either way then I guess it makes little difference whether the code is "the same" or "separate".
Steve Jessop
In practice, every modern compiler is able to fold together common code across different template instantiations - and indeed, fold together common generated code in general, even for seemingly unrelated sections of code. It's a very basic optimization. The compiler are linker are smarter than you.
Terry Mahaffey
@Terry: so if I use `vector<int>` in one T.U and `vector<float>` in another, do I have your personal guarantee that the code to perform a reallocation will only appear once in the linked executable, and calls to `push_back` on either kind of vector will call it? Assuming `sizeof(int) == sizeof(float)` in my implementation, that is. Assuming also that it's not inlined, of course, since if it was then it won't be common no matter what.
Steve Jessop
@Steve: The standard allocator has no requirement to know the type at the point where memory management is done: 'pointer allocate(size_type cnt, typename std::allocator<void>::const_pointer = 0)' So assuming a good compiler yes.
Martin York
Which compilers are good? Bearing in mind that a "reallocate" of a vector doesn't just call the allocator, it also copies the data. Since `int` and `float` are both POD, I expect this to be done in common code too for the win.
Steve Jessop
@Steve: You don't think their is a template specialization for all POD types that does a memcopy()?
Martin York
I don't know, what I want to know is whether I could safely assume that there is. Although remember this is just one example, the real issue is whether a typical compiler does, as Terry says, common up code even that's unrelated. My experience has mostly been the reverse, my compiler un-commons code even that's the *same function*, by inlining. But that may just be because I always use -O3, and that -Os has powers to merge identical-looking function bodies. Does it?
Steve Jessop
Just checked, and the answer is no. I defined two identical POD structs, create a vector of each, then loop doing `push_back` on both vectors 100 times. Compiled with g++ 4.3.2 with -Os, I can see two copies of the `push_back` code, one each for Foo and Bar (e.g. calls to two different versions of _M_insert_aux). And this is with everything in a single TU. It's possible that "good compilers" exist, but if GCC is not one of them then in practice Jerry's assertion is correct, "each instantiation of a template (normally) produces code that's separate and unique from other instantiations".
Steve Jessop
Wow, _M_insert_aux is a pretty big lump of code actually (well two near-identical lumps, one each for Foo and Bar). A lot more than just a wrapper to a call to common code to reallocate, followed by type-specific code to initialize the vector element. It does, however, contain a call to `memmove`, so it has less duplication than the absolute maximum possible.
Steve Jessop
Your test isn't conclusive Steve and a little overly simplistic - the compiler and linker are very smart and these are very complex topics. Just because they have a tool in the toolbox, it doesn't mean they use it every time. Try charting the resulting executable growth as you instantiate more and more different T's for a vector. You'll see it's decidedly non-linear, and in fact at some point it'll almost completely flatten out. (Also; I'm more of a VC guy than gcc - read about LTCG and Comdat folding in the VC compiler/linker if you're really interested)
Terry Mahaffey
@Terry, Steve, Martin: The compiler can't perform the optimization because every instantiated function has to have a unique address so that function pointer comparisons will work. vector<int>::push_back and vector<unsigned int>::push_back will always have distinct addresses, so they never share the same binary code. This is the same reason non-inline versions of inline functions are always generated. A clever whole program optimizing compiler could track if the address is never taken of templated functions and *then* allow them to merge, but I don't know if any compilers actually do that.
Joseph Garvin
@Joseph: Nope, that only applies if someone actually takes the address of those functions. If no one does, then they can be combined. "as if" rule in action. Also, non-inline versions of inline functions are not always generated - don't know where you heard that from.
Terry Mahaffey
@Terry: A dummy program that makes a std::vector<int> and std::vector<unsigned int> and pushes one number onto both of them, then prints them, running "nm" on the resulting executable, even with optimizations, shows symbols for both (only 1 redundant symbol compared to many with debug mode, but I assume that's because of inlining, not the optimization you describe). (Using GCC 4.2 and -O2)
Joseph Garvin
@Terry: And about inline functions -- the standard mandates that an inline function have a consistent address across the whole app. The only way for all users of the library within an app to see the same address is if the library includes a non-inline version of the function that's used for address taking. I shouldn't have said strictly *always*, because you could throw them away in an app, but not in a library. A smart linker may do it for libraries if you're annotating functions to indicate whether they're exported and the inline function is not.
Joseph Garvin