views:

69

answers:

3

I used reflection to look at the internal fields of System.String and I found three fields:

m_arrayLength

m_stringLength

m_firstChar

I don't understand how this works.

m_arrayLength is the length of some array. Where is this array? It's apparently not a member field of the string class.

m_stringLength makes sense. It's the length of the string.

m_firstChar is the first character in the string.

So my question is where are the rest of the characters for the string? Where are the contents of the string stored if not in the string class?

+3  A: 

Much of the implementation of System.String is in native code (C/C++) and not in managed code (C#). If you take a look at the decompiled code you'll see that most of the "interesting" or "core" methods are decorated with this attribute:

[MethodImpl(MethodImplOptions.InternalCall)]

Only some of the helper/convenience APIs are implemented in C#.

So where are the characters for the string stored? It's top secret! Deep down inside the CLR's core native code implementation.

Eilon
+2  A: 

I'd be thinking immediately that m_firstChar is not the first character, rather a pointer to the first character. That would make much more sense (although, since I'm not privy to the source, I can't be certain).

It makes little sense to store the first character of a string unless you want a blindingly fast s.substring(0,1) operation :-) There's a good chance the characters themselves (that the three fields allude to) will be allocated separately from the actual object.

paxdiablo
+3  A: 

The first char provides access (via &m_firstChar) to an address in memory of the fist character in the buffer. The length tells it how many characters are in the string, making .Length efficient (better than looking for a nul char). Note that strings can be oversized (especially if created with StringBuilder, and a few other scenarios), so sometimes the actual buffer is actually longer than the string. So it is important to track this. StringBuilder, for example, actually mutates a string within its buffer, so it needs to know how much it can add before having to create a larger buffer (see AppendInPlace, for example).

Marc Gravell
paxdiablo
@paxdiablo if you look at the decompiled code you'll see that the first char is the actual first char of the in-memory string. Thus, getting its address and indexing past that initial memory address will be the rest of the characters.
Eilon
So m_firstChar is actually the array (which would degrade to a pointer in C)?
paxdiablo
@paxdiablo I deliberately used the word "buffer", not string or array - as it is neither ;-p `string` is one of the two types in .NET (*along* with arrays) with indeterminate size. The buffer starts at the address of field m_firstChar. You could say that string is implemented *similarly* to an array, rather than *encapsulating* an array (which it doesn't; there *is* no `char[]` here).
Marc Gravell