tags:

views:

78

answers:

5

I come across a behavior in STL queue push which I don't quite understand.

Basically, I have two struct

structA{
  string a;
}

structB{
 char b[256];
}

structA st1;
structB st2;

...assign a 256 characters string to both st1 and st2...


queue<structA> q1;
queue<structB> q2;
for(int i=0 ; i< 10000; i++){
  q1.push(st1);
}

 for(int i=0 ; i< 10000; i++){
  q2.push(st2);
}

What I realize is that the queue using the char struct would use a much longer time (like 5X) in pushing the struct compared to string struct. Upon examination of the individual push, I realize that the char struct push performance has quite a number of spikes (range from 2X to 10X) here and there. What is the reason for that?

Thanks.

+1  A: 

Your C++ implementation probably uses a copy-on-write string implementation, meaning that string copies don't really copy the string (but instead link back to the copy), and only copy the string "for real" when you write to it.

To test whether this is the case, put this inside the loop, after the q1.push(st1) line:

++st1.a[0];

then time again.

Obviously, character arrays don't have copy-on-write behaviour, and is copied "for real" every time you ask for it to be copied.

Chris Jester-Young
A: 

The character array is larger than an empty string would be - the spikes may relate to the reallocations necessary as the vector grows for the larger amount of memory it uses.

If the strings aren't empty, then copy-on-write kicks in anyway, so you're trading some locking time / reference counter incrementing etc against the memory use: what's faster is system dependent.

Tony
From what I've read, COW is faster in single-threaded scenarios, where there's no real locking contention, etc. This certainly corresponds with the OP's timing harness---I see no threads being mentioned in the question.
Chris Jester-Young
Thanks for the reply. Actually I have another thread which is accessing the queue. So in this case, how does it work out?
Steveng
You're welcome. To safely `push()` with other reader threads, all those threads need to use a lock (mutex or read/write lock). Having got that lock won't prevent std::string from doing possibly unnecessary locking while handling the copy / reference counter, as the code in std::string has no knowledge of the locks in your code and whether there are even more threads that don't use locks but might try to read from one of the strings. Not sure if that answers your question...? Anyway, coding safely using locks then profiling's the best approach.
Tony
A: 

The reasons are most likely due to:

1) The dynamic allocation of memory to hold the character data inside each string
2) Possibly, but far less likely, the resizing of the deque page buffer that backs the queue.

Michael Goldshteyn
+1  A: 

Every time you push st1 or st2 onto a queue, you're actually pushing a copy of it (not a reference or pointer). The cost difference is in the copying of the data. In structB you have to copy the full 256 bytes every time. In structA you only copy the string instance, which most likely has copy-on-write semantics and hence until one of them is modified they will all share the same reference to the underlying string data.

OJ
Copy On Write ? http://stackoverflow.com/questions/1116040/memory-efficient-c-strings-interning-ropes-copy-on-write-etc/1116059#1116059 apparently gcc implements it still, though it's usually thought of as a deprecated "optmization".
Matthieu M.
Ask a general question, get a general answer.
OJ
A: 

std::queue is an adapter to another container (that implements front, back, push_back, and pop_front), unless you specify which container to adapt, it the will use std::deque. Deque does some block allocation magic in the background that should provide resizing similar to vector but performs better since it manages multiple non contiguous blocks and doesn't have to copy everything each time it resizes. Anyway, it's a guess, but I'd say thats the cause.

The byte array structure is seeing the hits more frequently due to making room for all those arrays, I'd bet over a longer scale the string struct would also generate spikes, it's just smaller now since string likely maintains references to the underling character storage until something changes it.

Nows your chance to get familiar with your profiler of choice and find out for sure! Fire valgrind (--callgrind) or whatever profiler your platform supports and see exactly which calls are using the time and where.

Jagerkin
COW? Very unlikely.
sbi