tags:

views:

134

answers:

6
  1. When a vector is created it has a default allocation size (probably this is not the right term to use, maybe step size?). When the number of elements reaches this size, the vector is resized. Is this size compiler specific? Can I control it? Is this a good idea?
  2. Do repeated calls to vector::size() recount the number of elements (O(n) calculation) or is this value stored somewhere (O(1) lookup). For example, in the code below

    // Split given string on whitespace
    vector<string> split( const string& s )
    {
        vector<string> tokens;
        string::size_type i, j;
        i = 0;
        while ( i != s.size() ) {
            // ignore leading blanks
            while ( isspace(s[i]) && i != s.size() ) {
                i++;
            }
            // found a word, now find its end
            j = i;
            while ( !isspace(s[j]) && j != s.size() ) {
                j++;
            }
            // if we found a word, add it to the vector
            if ( i != j ) { 
                tokens.push_back( s.substr(i, j-i) );
                i = j;
            }
        }
        return tokens;
    }
    

assuming s can be very large, should I call s.size() only once and store the result?

Thanks!

+1  A: 
  1. It's library-specific. You might be able to control the incremental allocation, but you might not.

  2. The size is stored, so it is very fast (constant time) to retrieve. How else could it work? C has no way of knowing in general whether a memory location is "real data" or not.

Ned Batchelder
Thanks for the answers. Can you point me to a source where I can learn more about how `vector` stores it size, its data structure, etc (of course, I can always look at the header but I was wondering if there's something easier to read)
recipriversexclusion
@reci: There is no "the" vector, every implementation is different. (Though you will find they agree on lot's of things.) Here's a good reference: http://www.cplusplus.com/reference/stl/vector/ though it skips out on an implementation. The only way to see how `vector` can be implemented is to open a header up and look.
GMan
Any book on data structures with a focus on C++ will cover vectors.
Dennis Zickefoose
+1  A: 
  1. The resizing mechanism is usually fixed. (Most compilers double the size of the vector when it reaches the limit.) The C++ standard specifies no way to control this behaviour.

  2. The size is internally updated whenever you insert/remove elements and when you call size(), it's returned immediately. So yes, it's O(1).

casablanca
+1  A: 

When the number of elements reaches this size, the vector is resized. Is this size compiler specific? Can I control it? Is this a good idea?

In general, this is a library-specific behavior, but you may be able to influence this behavior by specifying a custom allocator, which is non-trivial work.

Do repeated calls to vector::size() recount the number of elements (O(n) calculation) or is this value stored somewhere (O(1) lookup).

Most implementations store the size as a member. It's a single memory read.

greyfade
The allocator just hands out memory, it doesn't know anything about how much or why.
GMan
Could you not write an heuristic allocator that over-allocates and tries to keep the `vector`'s data in one spot when it grows, basing its allocations on the expected behavior of the `vector`?
greyfade
Theoretically, maybe, but if you need that level of control over the allocation, it would probably be easier to just wrap the vector in a new class that uses reserve as needed to give the implementation hints as to your needs. Or even just write your own.
Dennis Zickefoose
+4  A: 

The actual size of the capacity increment is implementation-dependent, but it has to be (roughly) exponential to support the container's complexity requirements. As an example, the Visual C++ standard library will allocate exactly the space required for the first few elements (five, if I recall correctly), then increases the size exponentially after that.

The size has to be stored somehow in the vector, otherwise it doesn't know where the end of the sequence is! However, it may not necessarily be stored as an integer. The Visual C++ implementation (again, as an example) stores three pointers:

  1. a pointer to the beginning of the underlying array,
  2. a pointer to the current end of the sequence, and
  3. a pointer to the end of the underlying array.

The size can be computed from (1) and (2); the capacity can be computed from (1) and (3).

Other implementations might store the information differently.

James McNellis
Beat me by a second. +1 for noticing the standard mandates exponential increase in capacity.
GMan
Do you think the calculation of size from pointers (1) and (2) is done when an element is added/deleted or when `size()` is called. If the latter, isn't it better not to call `size()` unnecessarily, i.e. store the result locally? Thanks.
recipriversexclusion
@recipriversexclusion: The pointers are updated whenever the size or capacity of the vector changes, then whenever `size()` or `capacity()` is called, subtraction is performed to get the result. From a performance standpoint, don't worry about it: subtraction is _fast_, and in 99.9999999999999999% of all cases, you'll never notice the difference (note that the optimizer might even optimize it, in which case you'd get performance similar to storing off the result).
James McNellis
@reci: If the former, where is the result stored? The point is that VC++ doesn't store it anywhere, it implements size as `return p2 - p1;`. And no, just use `size()` and stop micro-optimizing. It's easier to read and you won't save anything by worrying about it. Get your code working, *use a profiler*, and optimize that way. You'll never see `size()` as a slow part of your code.
GMan
Its done when you call size if that's the method used. But you're talking about a single subtraction. Don't go out of your way to avoid it, especially if there's any chance the size might change.
Dennis Zickefoose
+5  A: 

In most cases, you should leave the allocation alone unless you know the number of items ahead of time, so you can reserve the correct amount of space.

At least in every case of which I'm aware, std::vector::size() just returns a stored value, so it has constant complexity. In theory, the C++ standard allows it to do otherwise. There are reasons to allow otherwise for some other containers, primarily std::list, and rather than make a special case for those, they simply recommend constant time for all containers instead of requiring it for any. I can't quite imagine a vector::size that counted elements though -- I'm pretty no such thing has ever existed.

P.S., an easier way to do what your code above does, is something like this:

std::vector<string> split(std::string const &input) {
    vector<string> ret;
    istringstream buffer(input);

    copy(istream_iterator<string>(input),
         istream_iterator<string>(),
         back_inserter(ret));

    return ret;
}

Edit: IMO, The C++ Standard Library, by Nicolai Josuttis is an excellent reference on such things.

Jerry Coffin
Wow, thanks for the code, haven't thought about that one. Can I modify what characters istream skips, i.e. if I want to split the input string on commas instead of spaces?
recipriversexclusion
@recipriversexclusion: yes, but it's somewhat non-trivial. See: http://stackoverflow.com/questions/1894886/parsing-a-comma-delimited-stdstring/1895584#1895584
Jerry Coffin
I am pretty sure constant time `size` is required for `std::vector`, not just recommended. Recommended complexities are kind of worthless.
Dennis Zickefoose
@Dennis: I agree that it's pretty useless. Unfortunately, it's true -- see Table 65 of the standard. The exact wording they use is: "Those entries marked ‘‘(Note A)’’ **should** have constant complexity." [emphasis added]. (Oh, and yes, `a.size()` is one of the entries marked "(Note A)".
Jerry Coffin
I even double checked after posting that, and the subtlty of the wording still didn't sink in.
Dennis Zickefoose
+1  A: 

Unrelated to your actual questions, but here's a more "STL" way of doing what you're doing:

vector<string> split(const string& s)
{
    istringstream stream(s);
    istream_iterator<string> iter(stream), eos;
    vector<string> tokens;
    copy(iter, eos, back_inserter(tokens));
    return tokens;
}
tzaman
Thanks! I know that my code is still heavily C oriented, I call it C+ :-)
recipriversexclusion
@recip: C+, eh? That's amusing, I'll remember that for future use. :D
tzaman