views:

537

answers:

8

I've recent been reading about immutable strings, here and here as well some stuff about why D chose immutable strings. There seem to be many advantages.

  • trivially thread safe
  • more secure
  • more memory efficient in most use cases.
  • cheap substrings (tokenizing and slicing)

Not to mention most new languages have immutable strings, D2.0, Java, C#, Python, Ruby, etc.

Would C++ benefit from immutable strings?

Is it possible to implement an immutable string class in c++ (or c++0x) that would have all of these advantages?


update:

There are two attempts at immutable strings const_string and fix_str. Neither have been updated in half a decade. Are they even used? Why didn't const_string ever make it into boost?

+3  A: 
const std::string

There you go. A string literal is also immutable, unless you want to get into undefined behavior.

Edit: Of course that's only half the story. A const string variable isn't useful because you can't make it reference a new string. A reference to a const string would do it, except that C++ won't allow you to reassign a reference as in other languages like Python. The closest thing would be a smart pointer to a dynamically allocated string.

Mark Ransom
You'd need more than that, e.g. you'd want `std::string::replace` to return a modified copy rather than cause a compile error.
peterchen
@peterchen -> const std::string orig; const std::string copy = std::string(orig).replace(...); - what would an immutable string do that's better?
Noah Roberts
IMHO assignment of a new string is mutating a string, and from what I remember of API's that had such a construct this is how they took it too. What you want really does sound more like an assignable reference and it seems to me that something like a smart pointer would be a better answer to that than making a const string that's assignable. I also do find const std::string vars useful from time to time so I'd have to beg to differ there.
Noah Roberts
It's not the correct interface for an immutable object, it's two statements instead of one, it's an implementation detail leaking to the calling code? --- An object should make the right thing easy, the wrong thing hard (or impossible). Do I need to put a "don't show this string to other threads" comment between the copy and the replace, and afterwards a "now you can"? --- I agree that `const std::string` is a close approximation, but without some of the benefits.
peterchen
@Peter: It might be nice if the language supported two types of `replace`: The current one and `replaced`, where the latter operates on a const reference and returns a copy that has the replacements. The latter might be able to avoid copying everything twice. However, so long as we lack such a function, we're stuck with Noah's work-around, which is a reasonable alternative. The better answer would be full support for an immutable variant of std::string.
Steven Sudit
+4  A: 

As an opinion:

  • Yes, I'd quite like an immutable string library for C++.
  • No, I would not like std::string to be immutable.

Is it really worth doing (as a standard library feature)? I would say not. The use of const gives you locally immutable strings, and the basic nature of systems programming languages means that you really do need mutable strings.

anon
The closest I've come to immutable strings in C++ was a "span" class that has two const pointers, one for the begin and one for the end. It did not manage memory, but did support the usual utility functions (find, etc). As a result, it turned out to be very useful for parsing.
Steven Sudit
+1  A: 

You're certainly not the only person who though that. In fact, there is const_string library by Maxim Yegorushkin, which seems to have been written with inclusion into boost in mind. And here's a little newer library, fix_str by Roland Pibinger. I'm not sure how tricky would full string interning at run-time be, but most of the advantages are achievable when necessary.

Cubbi
A: 

Qt also uses immutable strings with copy-on-write.
There is some debate about how much performance it really buys you with decent compilers.

Martin Beckett
I would not call copy-on-write strings immutable. immutable strings are a subset of COW strings. That is, everything and immutable string can do a COW string could do as well, but the reverse is not true. It's these extra abilities that make COW strings suck for concurrent environments.
caspin
And the advantage to thread safety is completely gone once you throw COW in the mix (you need to lock, either explicitly or inside the library itself) whenever you are performing a write to ensure thread safety.
David Rodríguez - dribeas
@David: Qt uses thread-safe COW; it does it's own locking, with atomic integers for the reference count.
iconiK
@Caspin - true but if you are going to have immutable strings you might as well make efficient use of them with COW
Martin Beckett
@iconiK: That is the reason for the comment '(... or inside the library itself)'. The thing is that locking is required, and it can be a costly operation. The fact that it is hidden from the user means that there are less chances of doing it wrong in user code, but it does not take away the costs. If you compare that with Java inmutable strings, you can copy references and know they will never be changed, you can create modifications with almost no cost at all (allocations in a generational GC are *fast* --10 cpu instructions).
David Rodríguez - dribeas
... on the other end, in C++ allocations are slow, and moreover locking is also slow. If you read literature about `std::string` you will find out that the standard supports cow, and that some standard library implementations did it, but they are moving away from it as the advantage that they offered (less cost on copy unless there is a write) is smaller (in CPU time) that the cost they imply in a multithreaded environment.
David Rodríguez - dribeas
@Martin Beckett: *efficiency* is a term that depends on your usage pattern. COW in multithreaded environments requires locking operations, and those are costly, often more expensive than the copy itself unless there is a lot of copied strings that are not modified.
David Rodríguez - dribeas
Copy-on-write, as such, doesn't actually require locks; it just means that actions that appear to modify an instance actually point it to a new buffer, leaving the original alone. Replacing a pointer is almost always atomic. The hidden cost is in managing the lifespan of the original, which is usually done by reference counting. Even with interlocked operations, this counting is expensive, which is why std:string implementations have indeed moved away from it. In GC'd languages, like C#, this is a non-issue, so we have immutable strings, though without COW semantics.
Steven Sudit
A: 

constant strings make little sense with value semantics, and sharing isn't one of C++'s greatest strengths...

FredOverflow
Consider that C# has constant strings with value semantics.
Steven Sudit
@Steven Maybe we are talking about different things when we say "value semantics". C# strings are always handled through a transparent level of indirection (reference semantics), whereas C++ strings are not (value semantics).
FredOverflow
Maybe. In C#, actual value types (such as int) inherit from System.ValueType and are passed as copies, while reference types are passed by reference and (normally) compared by reference. While C# strings are references, they have value semantics in that they're immutable and are compared by content, not address. In C++, a std::string is a value, but it contains a reference (pointer, actually) to a mutable buffer. Therefore, passing a copy of a C++ string invokes the copy constructor to duplicate the buffer, whereas passing a const reference avoids the overhead. I hope that's clearer.
Steven Sudit
+3  A: 

I don't think there's a definitive answer here. It's subjective - if not because personal taste then at least because of the type of code one most often deals with. (still, a valuable question).

Immutable strings are great when memory is cheap - this wasn't true when C++ was developed, and it isn't the case on all platforms targeted by C++. (OTOH on more limited platforms C seems much more common than C++, so that argument is weak.)

You can create an immutable string class in C++, and you can make it largely compatible with std::string - but you will still lose when comparing to a built-in string class with dedicated optimizations and language features.

std::string is the best standard string we get, so I wouldn't like to see any messing with it. I use it very rarely, though, std::string has to many drawbacks from my point of view.

peterchen
A: 

We have imutable strings.

char const* const string = "Plop";

Lets compare:

* trivially thread safe
    YES:  
* more secure
    YES:
    Most compiler will put this in a read only segment(check your compiler doc for details)
* more memory efficient in most use cases.
    YES.
* cheap substrings (tokenizing and slicing)
    With a tiny bit of work.
Martin York
Ignoring the fact that you can cast the `const` away, you're technically right. The problem is that this isn't a string `class`, so it lacks the things we'd expect from one. As I mentioned elsewhere, I've had some success with a string segment class that has two non-owning, const pointers to the begin and end.
Steven Sudit
@Steven Sudit: Let's ignore the casting away cost because you can do that with anything anyway and doing so leads to undefined behavior (so its not a good idea).
Martin York
@Steven Sudit: Ots not a string class. OK I'll give you that. But doing std::string(myString) gives you a thread safe copy to play with. Problem solved.
Martin York
Sure, making a private copy does solve the problem, at the cost of the allocation and copy. There's still the issue of thread safety, in that the original might be in the process of being modified as the copying happens.
Steven Sudit
@Steven Sudit: You cant modify the original its a const*const. Means it is non modifiable.
Martin York
Martin, I don't see a declaration for `myString`. If it's a `char const* const`, then you're correct. I was thinking it was a `const std::string`.
Steven Sudit
(Or, to be clear, that it was a std::string that we were accessing through a const reference.)
Steven Sudit
A: 

Some people just don't use unbounded mutable or immutable strings. That simples up the problem pretty well.

MSN