ansaurus

Question

Efficient passing of std::vector

Answer 1

+1 A:

Pass by value unless you're certain passing by reference improves performances.

When you pass by value, copy elision may occur which will result in similar if not better performances.

Dave wrote about it here:

http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

Edouard A. 2009-11-20 15:44:04

Sebastian 2009-11-20 15:54:29

If it doesn't alter the vector, passing by value is perfectly fine too. +1 from me.

jalf 2009-11-20 15:59:04

Interesting. I'll have to read that more closely. Despite the title, though, it looks like he's focus on returns, not parameter passing, so I'm not sure it applies to a situation like this.

Adrian McCarthy 2009-11-20 15:59:09

Checked the article. As far as I understand copy elision only applies when the compiler can be sure that you don't need the original object anymore. This is not the case here. The caller of sum2 could just as well continue to use the original vector for other calculations, so it can not be "stealed" into this function.

shojtsy 2009-11-20 16:00:16

It appears it recommends passing by value if you actually want to make a copy, rather than passing a reference and making a copy from that.

UncleBens 2009-11-20 17:25:11

Answer 2

+7 A:

I believe that this code results in double dereferencing when the vector elements are accessed

Not necessarily. Compilers are pretty smart and should be able to eliminate common subexpressions. They can see that the operator [] doesn't change the 'pointer to the first element', so they have no need make the CPU reload it from memory for every loop iteration.

int3 2009-11-20 15:44:15

Excellent point.

Adrian McCarthy 2009-11-20 15:56:46

OK, so compilers can theoretically decrease the number of double derefences to 1 for each call of this method. Wouldn't an ensured 0 be better?

shojtsy 2009-11-20 16:04:36

Calculate your costs in terms of memory accesses, not the number of dereferences. In your proposed 'shallow copy' you'd be doing the memory access in the caller instead of the callee, so the net result would be (at best) the same. In fact, if the compiler doesn't inline the function, it'd probably be worse, since you're passing it a larger argument.

int3 2009-11-20 16:45:40

Answer 3

+5 A:

What is flawed with this idea?

Simple: It's premature optimization. Alternatives: Accept a vector<int> const& and use iterators or pass iterators directly to the function.

sellibitze 2009-11-20 15:44:30

Is it premature optimization when the code is simpler then the naive version? Why use iterators when we can create a more natural syntax with the same performace as iterators?

shojtsy 2009-11-20 15:52:03

Because you could have just passed the original iterator, by value or by const reference, and trusted the compiler to optimize your code. In many cases, the compiler is able to eliminate the double indirection. And in the cases where it can't do that, the performance difference almost certainly won't actually *matter*.

jalf 2009-11-20 15:58:21

A pair of iterators *is* already the "shallow copy you want". Also, it's premature optimization because you're likely looking in the wrong place to improve performance. Measure first, then take actions.

sellibitze 2009-11-20 16:01:28

@shojtsy: You could check whether there's any performance difference between `v[i]` and `int* p = ... p[i]` and report back. :)

sellibitze 2009-11-20 16:07:44

Answer 4

+2 A:

There is no double dereferencing because the compiler will probably pass the real pointer to the vector as the argument and not a pointer to a pointer. You can simply try this out and check the disassembly view of your IDE for what is actually going on behind the scenes:

void Method(std::vector<int> const& vec) {
 int i = vec.back();
}


void SomeOtherMethod() {
  std::vector<int> vec;
  vec.push_back(1);
  Method(vec);
}

What happens here? The vector is allocated on the stack. The first push back is translated to:

push        eax  // this is the constant one that has been stored in eax
lea         ecx,[ebp-24h] // ecx is the pointer to vec on the stack
call        std::vector<int,std::allocator<int> >::push_back

Now we call Method(), passing the vector const&:

lea         ecx,[ebp-24h] 
push        ecx  
call        Method (8274DC0h)

Unsurprisingly, the pointer to the vector is passed as references are nothing but permanently dereferenced pointers. Now inside Method(), the vector is accessed again:

mov         ecx,dword ptr [ebp+8] 
call        std::vector<int,std::allocator<int> >::back (8276100h)

The vector pointer is taken directly from the stack and written to ecx.

Sebastian 2009-11-20 15:48:03

To do that, the compiler would have to change the function signature, which seems unlikely. Changing the signature would break callers of the functions. It's possible if you have advanced link-time code generation, but I'm not sure if there's an implementation out there that's that advanced.

Adrian McCarthy 2009-11-20 15:55:39

I checked it out. No double dereferencing.

Sebastian 2009-11-20 16:45:09

The vector contains a pointer to the actual data, so passing a pointer to the vector is indeed passing a pointer to the pointer. The second dereference happens in the call to back(). If you passed an iterator instead, then (in most implementations you'd be passing a pointer directly to the data contained within the vector, thus eliminating a level of indirection.

Adrian McCarthy 2009-11-20 17:12:29

Now I finally understand what you meant by double dereferencing. I didn't think the poster meant that because this kind of double dereferencing always occurs even when accessing the vector on the stack.

Sebastian 2009-11-20 17:19:36

Sebastian, in your last code section, the first dereference is the mov, and the second is inside the back method when dereferencing the pointer stored as a field of the vector object, isn't it?

shojtsy 2009-11-20 17:28:34

@shojtsy: You're absolutely right. The original poster asked how to pass the vector as a method argument most efficiently, thus I thought he implied that by passing it as a reference, he would introduce an additional dereferencing.

Sebastian 2009-11-20 17:58:19

@Sebastian: I am the original poster. I was trying to describe that passing a reference to a vector introduces an extra dereferencing as compared to passing a random iterator pair by value (which is same as a pointer pair). The random iterator pair could then be encapsulated in a class having vector-like syntax without introducing a dereferencing level. I already start from the point that the idea is flawed, otherwise it would be widely used, I am just trying to find where the flaw is.

shojtsy 2009-11-20 18:18:52

Answer 5

+1 A:

You're right that there's an extra indirection here. It's conceivable (though it would be surprising) if the compiler (with the help of link-time code generation) optimized it away.

What you've proposed is sometimes called slicing, and it's used extensively in some situations. Though, in general, I'm not sure it's worth the dangers. You have to be very careful about invaliding your slice (or someone else's).

Note that if you used iterators for the loop instead of indexing, then you'd deref the reference only a couple times (to call begin() and end()) rather than n times (to index into the vector).

int sum(const vector<int> &v)
{
   int s = 0;
   for (auto it = v.begin(); it != v.end(); ++it) {
       s += fn(*it);
   }
   return s;
}

(I'm assuming the optimizer will hoist the begin() and end() calls out of the loop. You could do it explicitly to be certain.)

Passing a pair of iterators instead of the container itself seems like the STL idiom. That would give you more generality, as the type of container can vary, but so can the number of dereferences needed.

Adrian McCarthy 2009-11-20 15:53:26

Answer 6

+5 A:

What's wrong with your idea is that you already have two perfectly good solutions:

Pass the vector as is, either by value (where the compiler will often eliminate the copy), or by (const) reference, and trust the compiler to eliminate the double indirection, or
Pass an iterator pair.

Of course you can argue that the iterator pair is "less natural syntax", but I disagree. It is perfectly natural to anyone who's used to the STL. It is efficient, and gives you exactly what you need to work with the range, using std algorithms or your own functions.

Iterator pairs are a common C++ idiom, and a C++ programmer reading your code will understand them without a problem, whereas they're going to be surprised at your home-brewed vector wrappers.

If you're really paranoid about performance, pass the pair of iterators. If the syntax really bothers you, pass the vector and trust the compiler.

jalf 2009-11-20 16:02:10

ansaurus

tags:

views:

answers:

Efficient passing of std::vector

related questions