views:

145

answers:

2

I was playing around today with some timing code and discovered that when asigning a string literal to std::string, that it was around 10% faster (with a short 12 char string, so likly even bigger difference for large strings) to do so with a literal of known length (using the sizeof operator) than not. (Only tested with the VC9 compiler, so I guess other compilers may do it better).

std::string a("Hello World!");
std::string b("Hello World!", sizeof("Hello World!");//10% faster in my tests

Now the reason I suspect is for a it has to call strlen (VC9 goes into assembly which isnt a strong point of mine so I cant be 100% sure) to get the string length, then do the same as the second case does anyway.

Given how long std::string has been around, and how common the first case is (especially if you include +, =, +=, etc operators and equivalent methods) in real world programs how come it doesn't optimise the first case into the second? It seems a really simple one as well to just say if it's an std::basic_string object and a literal, compile it as if it was written like b?

+4  A: 

The first can't be optimised into the second. In the first, the length of the string is unknown and so has to be calculated, in the second you tell it how long it is, so no calculation is needed.

And using sizeof() makes no difference - that is calculated at compile time too. The constructor that the first case uses is:

 string( const char * s );

there is no way of this constructor detecting it is being given a string literal, much less calculating its length at compile time.

Also, constructing strings from C-style string literals happens relatively rarely in real code - it simply isn't worth optimising. And if you do need to optimise it, simply re-write:

while( BIGLOOP ) {
   string s( "foobar" );
   ...
}

as:

string f( "foobar" );
while( BIGLOOP ) {
   string s( f );
   ...
}
anon
The compiler should be able to figure out the length of the string at compile time in the same way it can figure out sizeof(...) since the string is constant
Greg
I think his point is valid, in the first case, the compiler _does_ know how long the string literal is. If you don't already know how std::string and the compiler work, there is no way to know that the compiler can't make use of its knowledge about the size of the literal.
John Knoeller
I know that it cant be optimised automatically using c++ syntax, thats why im asking why the compiler cant, since the compiler knows its a constant string, knows the length of constant strings (or can at least find out at compile time), and could be told about various cases in the c++ standard library where it could substitute one thing for another.
Fire Lancer
Well, the compiler could. But it would take a lot of effort on the part of the compiler writers and would bind the compiler tightly to the library - I suppose the compiler writers didn't think it was worth it, correctly IMHO .
anon
The compiler certainly could, it would just have to extend the standard to do so. **String constants can't be templates**, so you can't just make a templated constructor. **Default arguments can't rely on other arguments** so you can't sneak a call to `strlen()` in. In C++0x you could perhaps have a literal class containing the string length, but there would be no way to construct it from a normal string literal. This is genuinely a failure at the language level, although not a serious one.
Potatoswatter
Disagree, because string constants can't be template *parameters*. They certainly can cause Template Argument Deduction, because that merely requires a type, not a value. `template <size_t N> std::string::string(char const arg[N])` is a valid way to make a templated constructor.
MSalters
@MSalters Yes, but it wouldn't help in this case because you can't assume that the NTS competely fills the array being passed, so you would have to call strlen() on it.
anon
This extension will indeed cause "incorrect" behavior for `std::string("a\0b")`.
MSalters
+2  A: 

The compiler undoubtedly could do something like this, and actually you could do this yourself:

template<size_t SIZE>
std::string f(const char(&c)[SIZE]) {
    return std::string(c, SIZE);
}

int main() {
    std::string s = f("Hello");
    cout << s;
}

or even with a custom derived type (though there is no reason std::string couldn't have this constructor):

class mystring : public string {
public:
    template<size_t SIZE>
    mystring(const char(&c)[SIZE]) : string(c, SIZE) {}
};

int main() {
    mystring s("Hello");
    cout << s;
}

One large drawback is that a version of the function/constructor is generated for every different string size, and the whole class could even be duplicated if the compiler doesn't handle template hoisting very well... These could be deal-breakers in some situations.

joshperry
`SIZE` should be `size_t`, not `int`. We can't have a string of -1 length.
Chris Lutz
If you insist! I didn't realize I had -pedantic turned on :)
joshperry
@joshperry - I always have -pendantic turned on. :P
Chris Lutz
D'oh! I forgot that string literals are arrays. §2.13.4/1: "string literal has type “array of n const char”" Erasing my answer…
Potatoswatter
Looks unlikely that the compiler will generate additional code for this. Remember that functions declared inside `class{}` are inline, and you're not declaring a new specialization of `string` so `string` methods *will not* be duplicated. This should be zero-overhead on most platforms.
Potatoswatter
You have to take into account the array parameter may not be filled - for example, it may be an array of 10000 characters that contains the empty string - then using sizeof ratherv than strlen becomes a pessimisation.
anon
@Niel: But that problem doesn't exist for string literals... And doing something like `char* c = "Hello";` will throw a deprecation warning on newer compilers. See http://codepad.org/3OQMvLZH for example.
joshperry