views:

162

answers:

3

I am trying to understand string interning and why is doesn't seem to work in my example. The point of the example is to show Example 1 uses less (a lot less memory) as it should only have 10 strings in memory. However, in the code below both example use roughly the same amount of memory (virtual size and working set).

Please advice why example 1 isn't using a lot less memory? Thanks

Example 1:

        IList<string> list = new List<string>(10000);

        for (int i = 0; i < 10000; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(string.Intern(k.ToString()));
            }

        }

        Console.WriteLine("intern Done");
        Console.ReadLine();

Example 2:

        IList<string> list = new List<string>(10000);

        for (int i = 0; i < 10000; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(k.ToString());
            }

        }

        Console.WriteLine("intern Done");
        Console.ReadLine();
+6  A: 

The problem is that ToString() will still allocate a new string, and then intern it. If the garbage collector doesn't run to collect those "temporary" strings, then the memory usage will be the same.

Also, the length of your strings are pretty short. 10,000 strings that are mostly only one character long is a memory difference of about 20KB which you're probably not going to notice. Try using longer strings (or a lot more of them) and doing a garbage collect before you check the memory usage.

Here is an example that does show a difference:

class Program
{
    static void Main(string[] args)
    {
        int n = 100000;

        if (args[0] == "1")
            WithIntern(n);
        else
            WithoutIntern(n);
    }

    static void WithIntern(int n)
    {
        var list = new List<string>(n);

        for (int i = 0; i < n; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(string.Intern(new string('x', k * 1000)));
            }
        }

        GC.Collect();
        Console.WriteLine("Done.");
        Console.ReadLine();
    }

    static void WithoutIntern(int n)
    {
        var list = new List<string>(n);

        for (int i = 0; i < n; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(new string('x', k * 1000));
            }
        }

        GC.Collect();
        Console.WriteLine("Done.");
        Console.ReadLine();
    }
}
Dean Harding
Typical micro-optimization that simply does not show what it is supposed to do.
TomTom
Same result with the GC.Collect and using larger strings .... hrmmmm
CodingThunder
@ TomTom: What?
CodingThunder
I've edited my answer to show an example that *does* show a difference. The `WithIntern` uses about 14MB of memory (according to Task Manager). The second one gives an OutOfMemoryException after a second or so. You're simply not going to see any difference unless you have *tens* or *hundreds* of megabytes of strings allocated.
Dean Harding
+1  A: 

From the msdn Second, to intern a string, you must first create the string. The memory used by the String object must still be allocated, even though the memory will eventually be garbage collected.

Cornelius
+1  A: 

Remember, the CLR manages memory on behalf of your process, so it is really hard to figure out the managed memory footprint from looking at virtual size and working set. The CLR will generally allocate and free memory in chunks. The size of these varies according to implementation details, but due to this it is next to impossible to measure managed heap usage based on memory counters for the process.

However, if you look at the actual memory usage for the examples you'll see a difference.

Example 1

0:005>!dumpheap -stat
...
00b6911c      137         4500 System.String
0016be60        8       480188      Free
00b684c4       14       649184 System.Object[]
Total 316 objects
0:005> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x01592dcc
generation 1 starts at 0x01592dc0
generation 2 starts at 0x01591000
ephemeral segment allocation context: none
 segment    begin allocated     size
01590000 01591000  01594dd8 0x00003dd8(15832)
Large object heap starts at 0x02591000
 segment    begin allocated     size
02590000 02591000  026a49a0 0x001139a0(1128864)
Total Size  0x117778(1144696)
------------------------------
GC Heap Size  0x117778(1144696)

Example 2

0:006> !dumpheap -stat
...
00b684c4       14       649184 System.Object[]
00b6911c   100137      2004500 System.String
Total 100350 objects
0:006> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x0179967c
generation 1 starts at 0x01791038
generation 2 starts at 0x01591000
ephemeral segment allocation context: none
 segment    begin allocated     size
01590000 01591000  0179b688 0x0020a688(2139784)
Large object heap starts at 0x02591000
 segment    begin allocated     size
02590000 02591000  026a49a0 0x001139a0(1128864)
Total Size  0x31e028(3268648)
------------------------------
GC Heap Size  0x31e028(3268648)

As you can see from the output above the second example does use more memory on the managed heap.

Brian Rasmussen