ansaurus

Question

Answer 1

+4 A:

Maybe you C++ compiler is able to inline some code for the local map, but not when the map is a reference.

Alex Martelli 2009-05-25 00:39:26

If that's the case then you can see it by compiling with /FAcs (MSVC) or -c -g -Wa,-a,-ad (GCC).

Crashworks 2009-05-25 01:01:46

This seems the most plausible solution however when I tried out the /FAcs flag I'm not able to tell conclusively (mainly because I'm not an assembler guru). I've identified the line in question and there were some small differences in the generated asm in terms of offsets (ie "mov eax, DWORD PTR [-208+ebp]" became -224+ebp for the reference) the only significant change was that "lea ecx, DWORD PTR [-336+ebp]" in the local variable version became "map mov ecx, DWORD PTR [-124+ebp]" in the reference version

Jamie Cook 2009-05-26 00:25:22

Answer 2

A:

The problem I'm trying to figure out is why the inner loop takes so much longer to evaluate when the map is a reference?

I guess that what's taking a long time is the call to get_zone_screenline_usage: and that the reason why that's taking a long time is that the specific element doesn't already exist in m_container and must therefore be created and inserted before it can be returned.

ChrisW 2009-05-25 00:47:48

Answer 3

+3 A:

Your question is unclear. Do you mean to ask, why is passing a map by reference faster than passing by value? If so, the answer is simple: returning a map by value means copying it in its entirety, and copying a large map is a very expensive operation.

On the other hand, if your question is: why is faster to get a reference to an already existing map and populate that than it is to make a new map, then one hypothesis is that it has to do with

 screenline_usage[it->first] += usage * it->second;

Since key [it->first] already exists in the map inside path->get_zone_screenline_usage, then this is is simply an update operation and requires no memory allocation. However, if screenline_usage is an empty map, then each access to a new [it->first] means that it first has to allocate a new map node from the heap, and that's expensive.

Crashworks 2009-05-25 00:48:29

I'll go with this one :)

Magnus Skog 2009-05-25 06:00:09

Actually the empty local variable map is significantly faster to evaluate than the reference and it was this discrepency that I'm trying to understand. I've updated the question to reflect this.

Jamie Cook 2009-05-25 06:13:42

Well, rather than make any more guesses, my best suggestion would be to hook up a sampling profiler and see where a tight loop on the slow algorithm spends its time.

Crashworks 2009-05-25 07:59:18

Answer 4

+2 A:

If I understand your question correctly, you imply that inserting to a local stack-allocated map<> is much faster than inserting to an existing heap-allocated map<> that you retrieve by reference.

There are several possibilities that may be affecting performance. I doubt it has anything to do with C++ reference performance, though.

The first possibility, is that by changing screenline_usage to a reference, you are "essentially" retrieving a pointer to an existing object. The actual instance of the object may not be map<int,float>, it could by anything that inherits from map. For instance, it could be a map with a custom comparator function defined for it. Or a subclass with thread-syncronization logic. Since you don't know what type you are getting from the call to m_container[access_mode][zone_id], you may very well be getting a subclass that does not perform well on insert. (You can see what type you get back in the debugger, by the way.) Whereas, by creating an empty map<int,float> you have a guarantee that the type is an actual map<>, and not a subclass.

Let's assume that you are getting an actual map<> instance back.

The second possibility, is that in the particular flavor of STL that you are using, the map::clear() function does not efficiently clear out internal data structures used to maintain the associative indexes. As I recall, stl::map<> uses some complex internal hashing and bucket structures to provide efficient insert and retrieval capabilities. However, it's unclear what happens to those when you call clear(). So it's possible that screenline_usage.clear() results in a less efficient map to insert against than if you started with an empty map.

Finally, let's assume that I'm wrong, and it's neither of these possibilities. There is a simple way to determine whether the difference is the result of using a reference variable. You can try allocating a new map on the heap and assigning it to a reference in your code, like this:

map<int, float>& screenline_usage = new map<int,float>();

This will help you determine if there is some difference in inserting to an existing map vs. a new map, or if it is indeed the fact that screenline_usage is a reference that is affecting performance. BTW, Don't forget to free this heap-allocated map, or you'll end up with a memory leak.

LBushkin 2009-05-25 01:51:27

ChrisW 2009-05-25 02:01:08

You're right. But I want to provide an example that could evaluate the performance difference of references (if any) in the OP question.

LBushkin 2009-05-25 03:18:23

Just to correct some wrong info here: stl::map is an ordered container, and is usually implemented as red-black tree. What you are describing above is a hashtable, or unordered_map in TR1.

Nikolai N Fetissov 2009-05-25 03:35:08

I don't think that your 'first possibility' logic is valid. std::map is not an interface and none of its functions are virtual. The comparator is part of its type (template parameter) and can't be influenced by inheritance. A reference to a std::map could be a reference to a base class sub-object of a larger type but this is irrelevant to performance as the data structure and methods invoked on the object will be identical in either case as there can be no polymorphic behaviour through the reference.

Charles Bailey 2009-05-25 09:47:13

Answer 5

+1 A:

References are most often behind the scenes implemented as pointers. An exception to this would be if a temporary is assigned to them , then the lifetime of the value is extended to the lifetime of the reference; in essence, the reference is the object. So, it depends if:

get_combined_screenline_usage

returns by-reference or by-value. If it is by reference, the reference way might be faster. If it is by value, then the old way and new way are doing essentially the same thing, assuming the compiler does a return value optimization.

In either case, other optimizations the compiler does (for instance inlining) might mask the effects of the two choices; effectively, without knowing which exact line your slow part is, it makes this a bit of a guessing game.

I'd suggest trying to get finer grain information, and narrow it down to exactly the line that is taking so long, it makes optimizing a lot easier.

(as a note:

map<int, float>::iterator it = my_screenline_usage.begin(); 
for (; it != my_screenline_usage.end(); ++it)

might be more efficient if written as

for (map<int, float>::const_iterator it(my_screenline_usage.begin()), end(my_screenline_usage.begin()); it != end; ++it)

)

Todd Gardner 2009-05-25 02:05:30

ansaurus

tags:

views:

answers:

speed of references in C++

related questions