views:

514

answers:

6

I was curious how the StringBuilder class is implemented internally, so I decided to check out Mono's source code and compare it with Reflector's disassembled code of the Microsoft's implementation. Essentially, Microsoft's implementation uses char[] to store a string representation internally, and a bunch of unsafe methods to manipulate it. This is straightforward and did not raise any questions. But I was confused, when I found that Mono uses a string inside StringBuilder:

private int _length;
private string _str;

The first thought was: "What a senseless StringBuilder". But then I figured out that it is possible to mutate a string using pointers:

public StringBuilder Append (string value) 
{
     // ...
     String.CharCopy (_str, _length, value, 0, value.Length);
}

internal static unsafe void CharCopy (char *dest, char *src, int count) 
{
    // ...
    ((short*)dest) [0] = ((short*)src) [0]; dest++; src++;
}    

I used to program in C/C++ a little, so I can't say that this code confused me much, but I thought that strings are completely immutable (i.e there is absolutely no way to mutate it). So the actual questions are:

  • Can I create a completely immutable type?
  • Is there any reason to use such code apart from performance concerns? (unsafe code to change immutable types)
  • Are strings then inherently thread-safe or not?
+1  A: 

Can i create a completely immutable type?

Yes. Have a constructor to set private fields, get only properties and no methods.

Is there any reason to use such code apart from performance concerns?

One example: such types don't require locks to be safely used from multiple concurrent threads, this makes correct code easier to write (no locks to get wrong).

Additional: it is always possible for sufficiently privileged code to bypass .NET protections: either reflection to read and write to private fields, or unsafe code to directly manipulate an object's memory.

This is true outside of .NET, a privileged process (i.e. with a process or thread token with one of the "God" privileges, e.g. Take Ownership enabled) can break into any other process load dlls, inject threads running arbitrary code, read or write memory (including overriding execute prevention etc.). The integrity of the system is only as strong as the cooperation of the owner of the system.

Richard
Thanks, i didn't think about locks)
n535
Such a type is not immutable in unsafe code. *Nothing* is immutable in unsafe code; you can write to every single byte of memory in unsafe code.
Eric Lippert
So does it mean i still have to use locks with immutable types, Eric?
n535
Just having private fields doesn’t make anything immutable even in non-unsafe code: you can still use Reflection...
Timwi
And even without reflection, if any of your readonly properties return something mutable, that property can still be changed. For example, if you expose a list as readonly, others can still add and remove items.
Bryce Wagner
@Bryce Wagner: If your immutable type has a mutable proberty, it's not an immutable type.
Guffa
@Timwi: You can always break things using reflection or unsafe code. The purpose of immutable types is that they can't be changed by mistakes in normal code.
Guffa
@Eric, @Timwi: unsafe code and reflection all bets are off anyway (since such code could break locks anyway, e.g. replace the object being used with a Monitor) .
Richard
@Richard: But that is exactly the scenario the OP is asking about. So your answer of "Yes" should really be "No."
Dan Tao
@DanTao: true (lost track of that aspect of the question). But then not true: if operating at that level one could create a type with memory protected at a hardware level and a busy thread to ensure it stays that way (`VirtualProtect` with `PAGE_READONLY`). It is always possible to bypass consistency with lower level code. That said I'm going to update the answer...
Richard
+2  A: 

If you go unsafe, it is possible to mutate strings in C# too (IIRC).

leppie
Yes, it is. However, as strings are interned, you should really know a lot about how strings work before attempting to do that.
Guffa
+1  A: 

There is no black magic at work here. The string class is immutable simply because it doesn't have any public fields, properties or methods that allows you to modify the internal string. Any method that mutates a string returns a new string instance. You of course can do this as well with your own classes.

Hans Passant
Yeah, i understand that it is not something extraordinary, but i forgot about thread safety completely. I always thought that i can use immutable types without locks, now i don't think so. So now i am confused: i was either wrong from the beginning or i am wrong now (or even both).
n535
Well, sure, you don't have to protect an object of an immutable class. Nobody can alter it. Not that it is very practical, you essentially always work with stale data.
Hans Passant
+3  A: 

There is no completely immutable type, a class that is immutable is that because it doesn't allow any outside code to alter it. Using reflection or unsafe code you can still change it's values.

You can use the readonly keyword to create an immutable variable, but that works only for value types. If you use it on a reference type, it's only the reference that is protected, not the object that it points to.

There are several reasons for immutable types, like performance and robustness.

The fact that strings are known to be immutable (outside the StringBuilder) means that the compiler can make optimisations based on that. The compiler never has to produce code to copy a string to protect it from being changed when it's passed as a parameter.

Objects created from immutable types can also be safely passed between threads. As they can't be changed, there is no risk for different threads changing them at the same time, so there is no need to synchonise access to them.

Immutable types can be used to avoid coding errors. If you know that a value should not be changed, it's generally a good idea to make sure that it can't be changed by mistake.

Guffa
+32  A: 

Can i create a completely immutable type?

You can create a type where the CLR enforces immutability on it. You can then use "unsafe" to turn off the CLR enforcement mechanisms. That's why "unsafe" is called "unsafe" - because it turns off the safety system. In unsafe code every single byte of memory in the process can be writable if you try hard enough, including both the immutable bytes and the code in the CLR which enforces immutability.

You can also use Reflection to break immutability. Both Reflection and unsafe code require an extremely high level of trust to be granted.

Is there any reason to use such code apart from performance concerns?

Sure, there are lots of reasons to use immutable data structures. Immutable data structures rock. Some good reasons to use immutable data structures:

  • immutable data structures are easier to reason about than mutable data structures. When you ask "is this list empty?" and you get an answer then you know that answer is correct not just now, but forever. With mutable data structures you cannot actually ask "is this list empty?" All you can ask is "is this list empty right now?" and then the answer logically answers the question "was this list empty at some point in the past?"

The fact that the answer to a question about an immutable type stays true forever has security implications. Suppose you have code like this:

void Frob(Bar bar)
{
    if (!IsSafe(bar)) throw something;
    DoSomethingDangerous(bar);
}

If Bar is a mutable type then there is a race condition here; bar could be made unsafe on another thread after the check but before something dangerous happens. If Bar is an immutable type then the answer to the question stays the same throughout, which is much safer. (Imagine if you could mutate a string containing a path after the security check but before the file was opened, for example.)

  • methods which take immutable data structures as their arguments and return them as their results and perform no side effects are called "pure methods". Pure methods can be memoized, which trades increased memory use for increased speed, often enormously increased speed.

  • immutable data structures can often be used on multiple threads simultaneously without locking. Locking is there to prevent creation of inconsistent state of an object in the face of a mutation, but immutable objects don't have mutations. (Some so-called immutable data structures are logically immutable but actually do mutations inside themselves; imagine for example a lookup table which does not change its contents, but does reorganize its internal structure if it can deduce what the next query is likely to be. Such a data structure would not be automatically threadsafe.)

  • immutable data structures that efficiently re-use their internal parts when a new structure is built from an old one make it easy to "take a snapshot" of the state of a program without wasting lots of memory. That makes undo-redo operations trivial to implement. It makes it easier to write debugging tools that can show you how you got to a particular program state.

  • and so on.

Are strings then inherently thread-safe or not?

If everyone plays by the rules, they are. If someone uses unsafe code or private reflection then there is no rule enforcement anymore. You have to trust that if someone is using high-privilege code then they are doing so correctly and not mutating a string. Use your power to run unsafe code only for good; with great power comes great responsibility.

So do I need to use locks or not?

That is a strange question. Remember, locks are co-operative. Locks only work if everyone accessing a particular object agrees upon the locking strategy that must be used.

You have to use locks if the agreed-upon locking strategy for accessing particular object in a particular storage location is to use locks. If that isn't the agreed-upon locking strategy then using locks is pointless; you're carefully locking and unlocking the front door while someone else is walking in the open back door.

If you have a string which you know is being mutated by unsafe code, and you don't want to see inconsistent partial mutations, and the code which is doing the unsafe mutation documents that it takes out a particular lock during that mutation, then yes, you need to use locks when accessing that string. But this situation is very rare; ideally no one would use unsafe code to manipulate a string accessible by other code on another thread, because doing so is an incredibly bad idea. That's why we require that code that does so is fully trusted. And that's why we require that the C# source code for such a function wave a big red flag that says "this code is unsafe, review it carefully!"

Eric Lippert
I swear it would take me 3 hours to type up something like this.
ChaosPandion
Thanks, this helped a lot.
n535
@Eric: Be nasty with less effort: a few random TerminateThread calls :-)
Richard