tags:

views:

379

answers:

5

I have a String which I would like to modify in some way. For example: reverse it or upcase it.

I have discovered that the fastest way to do this is by using a unsafe block and pointers.

For example:

        unsafe 
        {
            fixed (char* str = text)
            {
                *str = 'X';
            }
        }

Are there any reasons why I should never ever do this?

+9  A: 

The .Net framework requires strings to be immutable. Due to this requirement it is able to optimise all sorts of operations.

String interning is one great example of this requirement is leveraged heavily. To speed up some string comparisons (and reduce memory consumption) the .Net framework maintains a Dictionary of pointers, all pre-defined strings will live in this dictionary or any strings where you call the String.intern method on. When the IL instruction ldstr is called it will check the interned dictionary and avoid memory allocation if we already have the string allocated, note: String.Concat will not check for interned strings.

This property of the .net framework means that if you start mucking around directly with strings you can corrupt your intern table and in turn corrupt other references to the same string.

For example:

         // these strings get interned
        string hello = "hello";
        string hello2 = "hello";

        string helloworld, helloworld2;

        helloworld = hello;
        helloworld += " world";

        helloworld2 = hello;
        helloworld2 += " world"; 

        unsafe
        {
            // very bad, this changes an interned string which affects 
            // all app domains.
            fixed (char* str = hello2)
            {
                *str = 'X';
            }

            fixed (char* str = helloworld2)
            {
                *str = 'X';
            }

        }

        Console.WriteLine("hello = {0} , hello2 = {1}", hello, hello2);
        // output: hello = Xello , hello2 = Xello  


        Console.WriteLine("helloworld = {0} , helloworld2 = {1}", helloworld, helloworld2);
        // output : helloworld = hello world , helloworld2 = Xello world
Sam Saffron
+11  A: 

Are there any reasons why I should never ever do this?

Yes, very simple: Because .NET relies on the fact that strings are immutable. Some operations (e.g. s.SubString(0, s.Length)) actually return a reference to the original string. If this now gets modified, all other references will as well.

Better use a StringBuilder to modify a string since this is the default way.

Konrad Rudolph
A: 

Oh dear lord yes.

1) Because that class is not designed to be tampered with.

2) Because strings are designed and expected throughout the framework to be immutable. That means that code that everyone else writes (including MSFT) is expecting a string's underlying value never to change.

3) Because this is premature optimization and that is E V I L.

Dave Markle
+1  A: 

Put it this way: how would you feel if another programmer decided to replace 0 with 1 everywhere in your code, at execution time? It would play hell with all your assumptions. The same is true with strings. Everyone expects them to be immutable, and codes with that assumption in mind. If you violate that, you are likely to introduce bugs - and they'll be really hard to trace.

Jon Skeet
I can just imagine them being used in a Dictionary, oh man that would play havoc!
leppie
A: 

Agreed about StringBuilder, or just convert your string to an array of chars/bytes and work there. Also, you gave the example of "upcasing" -- the String class has a ToUpper method, and if that's not at least as fast as your unsafe "upcasing", I'll eat my hat.

Coderer