ansaurus

Question

Answer 1

+4 A:

Generally, if adding elements to a vector is a bottleneck, you should use std::vector<T>::reserve to reserve some space in advance. This should reduce the likelihood that a call to push_back will trigger a memory reallocation.

That said, string processing in general can be pretty CPU intensive, and reallocating a vector of string objects requires a lot of copying. Every time the vector reallocates memory, each string object needs to be copied to another location in memory. (Fortunately, this will be mitigated substantially once C++0x move constructors are in place.)

Also, the fact that you are clearing the vector each time doesn't change the fact that every call to push_back results in copying a string object into the vector, which is probably the cause of all the heap allocations you're seeing. Don't forget that every instance of std::string needs to allocate memory on the heap to store the string.

Charles Salvia 2010-10-23 23:59:54

Answer 2

A:

vector would be the best if you know the number of resulting strings and not if you don't know it. deque or list will do better. but maybe you can check what's the capacity of the vector at the beginning and what's the size in the end.

DaVinci 2010-10-24 00:02:11

Uhm, why do you think so, and do you have any evidence?

Alf P. Steinbach 2010-10-24 00:21:08

A linked list is one of the (if not the) worse performing containers.

GMan 2010-10-24 05:03:45

Answer 3

A:

You could switch to a vector that indirectly holds the strings. Then the strings aren't copied on every resize of the storage, only the "handles" are copied. So instead of std::vector<std::string> &words, something more like std::vector< counted_ptr<std::string> > &words. Then see this Dr. Dobb's article for more about counted_ptr<>.

Also, to avoid a potential Heisenbug chase, auto_ptr<> is not what you want to use for this sort of thing in an STL container.

Eric Towers 2010-10-24 00:26:41

Answer 4

A:

Firstly, you should consider passing an output iterator instead of a vector&. This would result in a cleaner and more flexible design.

The definition of clear() makes no guarantees about memory utilisation. The implementation is perfectly entitled to free all used memory when you call clear. It could quite reasonably be implemented like so:

void clear() { vector tmp; swap(tmp); }

You might get lucky calling resize(0) instead of clear(), but even that isn't required to preserve the vector's capacity.

If you really want to squash all those memory allocations:

Define the function as a template function with an output iterator, as I suggest above, also passing in a count limit.
Pass in a plain-old C-array big enough to hold the maximum number of words you expect to see.
Use std::pair<const char*, const char*> instead of std::string to hold the words found.

Marcelo Cantos 2010-10-24 00:27:33

Answer 5

A:

The code looks like it works well but the devil is always in the details when it comes to performance. Here are a few thoughts:

Consider changing the vector declaration from :

from: std::vector< std::string > &words
to : std::vector< std::string* > &words

This will create a pointer and assign it an address of the string as opposed to copying the contents of each string into the vector.
Try to use vector::reserve to pre-allocate the memory needed to process the string. A rough estimate might be text.length() / maxWidth.
Pay close attention to the string operations that are being used. It's very possible that there are alot of temporary strings being generated and immediately thrown away. The best way to find out if this is happening is to step through your string manipulation lines and see if there are extra string constructors and copy constructors ocurring.

skimobear 2010-10-24 00:28:23

Most string implementations are reference-counted, so using `string*` will actually slow things down, because it won't reduce copying, but it will require an extra heap object, compared to `string`. Plus there's the extra work of dealing with managing raw pointers to heap objects.

Marcelo Cantos 2010-10-24 00:40:13

@Marcelo - Thanks for the follow up. After reviewing again I think a string pointer wouldn't work well with the existing code. string::substr() returns a new string object that has to be copied before it is discarded. A vector of char pointers could be an efficient way to go but that wouldn't be an improvement to the existing code it would be more of a re-write. Thanks for keeping me honest :)

skimobear 2010-10-24 00:55:31

ansaurus

tags:

views:

answers:

std::vector push_back is bottleneck

related questions