tags:

views:

167

answers:

5

Hi all.

I'm especially interested of windows, mingw.

Thanks.

Update: First, I thought everyone is familiar with string interning. http://en.wikipedia.org/wiki/String_interning

Second, my problem is in detail: I knocked up a string class for practice. Nothing fancy you know, i just store the size and a char * in a class.

I use memcpy for the assignment.

When i do this to measure the assignment speed of std::string and my string class:

string test1 = "  65 kb text ", test2;
for(int i=0; i<1000000; i++)
   {
   test2 = test1;
   }

mystring test3 = "65 kb text", test4;
for (int i=0; i<1000000; i++)
   {
   test4 = test3
   }

The std::string is a winner by a large margin. I do not do anything in the assignment operator (in my class) but copy with memcpy. I do not even create a new array with the "new" operator, cause i check for size equality, and only request new if needed. How come?

For small strings, there is no problem. I cant see how can std::string assign values faster than memcpy, i bet it uses it too in the background, or something similar, so that's why i asked interning.

Update2: by modifying the loops with a single character assignment like this: test2[15] = 78, I avoided the effect of copy-on-write of std::string. Now both codes takes exactly the same time (okay, there is an 1-2% difference, but that is negligible). So if I am not entirely mistaken, the mingw std::string must use COW.

Thank you all for your help.

A: 

No, there is no string interning in the STL. It doesn't fit the C++ design philosophy to have such a feature.

DeadMG
Mate, then how come I assign the same 65 kb to a char array (with memcpy) and to a string and the string assignment is MUCH MUCH MUCH faster?
Mike Shinola
@Mike: look at the disassembly, or ask another question...
Steve Jessop
@Mike Shinola: If you don't show code, how the hell am I supposed to know?
DeadMG
@Mike Shinola: If this is your question, then just ask it in the first place!
Sven Marnach
@Mike: Some `std::string` implementations use copy-on-write to make assignments fast. That doesn't mean that the data is being interned. Also remember effects of processor caching on your test cases -- make sure you test both assigning `std::string` before the `memcpy`, and after. Oh, and one more thing: If you're using `strlen` to get the length of the string before `memcpy` ing the string, that's most likely the cause, not the assignment.
Billy ONeal
I did not wanted to ask the aformentioned question cause I wanted to investigate the issue further before posting it. But it really bugs me, so i think i post it in an update.
Mike Shinola
@Mike Shinola: Does your 65kb char array have some zeros in it?
Alan
@Billy ONeal Aaaah, that copy-on-write thing can be the reason. Unfortunately, I can not really test it cause i cant store that much unique long strings in the memory.
Mike Shinola
@Mike Shinola: Copy-on-write is not overly common. What compiler are you using?
Alan
@Alan Not, I just typed in "asdasdasd" and copied it some thousand times =). More explanation coming.
Mike Shinola
@Mike: Post the code you're using, the results you expect, and the results you're getting, and the compiler you're using, in your question. Don't make us fish for the information you really want.
Billy ONeal
@Billy ONeal Sorry man, i will do my best.
Mike Shinola
+6  A: 

Simply put, no. String interning is not feasible with mutable strings, such as all std::string-objects.

eq-
+1 Since the OP may not quite get this; interned strings are stored in read only memory. std::string is mutable, which means they can be modified. You cannot modify readonly memory, this you cannot store interned string addresses in a std::string.
Ed Swangren
+1 Actually, I would venture that it is not ever feasible. Any gain by copies saved would be lost in the time it would take to compare each string to every other string all the time.
Billy ONeal
@Ed Swangren Read only memory? Whoa. How? I dont get it. You dont mean ROM, do you? :P
Mike Shinola
@Mike: Read only status of memory is controlled by the MMU. On x86 architectures specifically, there are access control bits in each TLB entries which control whether that page can be read, written, and/or executed. But read-only memory permissions are not required for string interning, immutable string objects are.
Ben Voigt
@Ben Voigt Thanks for the explanation, you seem to know a lot about architecture.
Mike Shinola
@Ed: There's no absolute requirement that interned strings *must* be stored in read-only memory. They could be placed anywhere. It just makes sense to mark the memory pages containing interned strings as read-only since it's all known in advance and can be generated at compile-time. But as @eq- points out, that's not feasible when every string is a mutable object.
jalf
+2  A: 

Not so much, since std::string is modifiable.

Implementations have been known to attempt the use of copy-on-write, but that causes such problems in multi-threaded code that I think it's out of fashion. It's also very hard to implement correctly - perhaps impossible? If someone takes a pointer to a character in the string, and then modifies another character, I'm not sure that this is permitted to invalidate the first pointer. If it's not allowed, then COW is out of the question too, I think, but I can't remember how it works out.

Steve Jessop
+1 because some prick downvoted without commenting, and because the answer is completely correct.
Billy ONeal
@Billy: to be fair, it may have been incorrect at the time of the downvote. As I say, I can't remember whether COW works out to be legal (I think it was *intended* to be, but the standard arguably forbade it by accident, or something like that). I went through a couple of versions of "could be COW" before reaching my current text. I was hoping for a comment to explain the downvote, though.
Steve Jessop
+5  A: 

String interning may be done by the compiler only for string literals appearing in the code. If you initialise std:strings with string literals, and some of the literals occur multiple times, the compiler may store only one copy of this string in your binary. There is no string interning at run time. mingw supports compile time string interning as explained before.

Sven Marnach
Replace "is done" with "may be done" and you'll have +1. The standard does not require that behavior.
Billy ONeal
Well, mingw does. (Updated the post anyway :)
Sven Marnach
@Sven: err.. so? MinGW is not the standard.
Billy ONeal
The OP asked specifically about mingw.
Sven Marnach
A: 

Two ideas:

  • Is myclass a template class? The std::string class is a typedef of the template class basic_string. This means that the complete source of basic_string instead of just the header is accessible to the compiler when your test function is compiled. This additional information enables more optimisations in exchange for higher compilation time.

  • Most c++ standard library implementations are highly optimised (and sadly almost unreadable).

josefx
"Myclass" is not a template class. For more information please look at the update.
Mike Shinola