tags:

views:

210

answers:

6

I just want to be sure:

string X="";

char Char=X[0];  //an exception "Index was outside the bounds of the array"

This means that the string is really treated as an array of chars, right? At least internally.

+10  A: 

The C# language spec makes no guarantee about the internal representation of a string. However, it implements the index operator to provide a char for each character in the string.

Edit: To clarify since a few people have commented, yes, the internal representation of System.String in the CLR is an array. However, the language specification doesn't say anything about internal representation, so this could (but is unlikely to) change. It says that a string has to work as a sequence of chars. The only bit about this in the language spec is under section 1.3:

Character and string processing in C# uses Unicode encoding. The char type represents a UTF-16 code unit, and the string type represents a sequence of UTF-16 code units.

Additionally, MSDN states:

A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable (that is, it is read-only).

So in this case, we're now talking about the CLR and not the language. System.String -- However, even there they don't guarantee an array, only a sequential collection.

A string implemented with a linked list and an indexer that moved n spaces forward in the list would be sufficient to satisfy the language reqiurements. IList<char> would also satisfy the requirements, and IList doesn't have to be array-backed.

David Pfeffer
-1 The MSDN docs state: A String object is a sequential collection of System.Char objects that represent a string.
Bear Monkey
@Bear Monkey - 'sequential collection' is not the same as 'array'. You should retract your downvote I think.
Steve Townsend
The only restriction in the normative ECMA document talk of character buffer, not specifically a System.Array instance : Implementations of System.String are required to containa variable-length character buffer positioned a fixed number of bytes afterthe beginning of the String object.
VirtualBlackFox
He doesn't say array he says it make no guarantees on internal representation. But this isn't true.
Bear Monkey
Guys, MSDN is right in that a string is a sequential collection. However, this is an outward facing constraint. For all that reference cares, you could implement string using a linked list with numeric indexer, which is also a sequential collection.
David Pfeffer
i beg to differ have you done any pointers on string before? its an array and will not change.
Bear Monkey
Whether or not its an array isn't the question -- obviously its an array. The question is whether or not it *must* be an array, and I see nothing that states that anywhere in the documentation.
David Pfeffer
+1 for moral support :-)
Steve Townsend
If string supports pointer manipulation it must always be an array. Damned if i can find that written anywhere though.
Bear Monkey
@Bear Monkey - pointer manipulation is not a technique that is used in C# (managed) code.
Steve Townsend
@Chris: I've modified that first sentence to clarify the meaning, which is in fact true. I'll have to respectfully disagree as to the question's intent, based on the fact that he asks about the internal treatment of the data.
David Pfeffer
@Steve I cannot see anywhere that the question was limited to managed code. In fact the question ask about internal representations and could be beyond just managed code.
Bear Monkey
@Bear Monkey - nothing to suggest managed code apart from the C# tag, applied by OP
Steve Townsend
FYI - Removed down vote due to updated answer. Although I feel the first sentence is still misleading.
Chris Lively
+1  A: 

You might find this MSDN doc helpful.

In a nutshell, a string is "stored as a sequential read-only collection of Char objects"

And, yes, it can be accessed just like a char array. So, if X contained a value other than String.Empty, then the char Char=X[0;] code would have returned the first character of the string.

Chris Lively
+1  A: 

As far as I know, that's correct. Btw here's a page with everything you ever wanted to know about Strings:

Jeen Broekstra
+2  A: 

Per @JaredPar elsewhere on this site:

The underyling string you create will also need a contiguous block of memory because it is represented as an array of chars (arrays require contiguous memory) .

I am sure you should not rely on this as it's not part of the interface, but implementation is an array if this statement is correct. That makes sense to me given what we know about char-strings and Microsoft's need to support efficient interop between managed and native languages.

MSDN says only this, which does not guarantee that the storage is an array.

A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable (that is, it is read-only).

Steve Townsend
+1 for a clear, documented, answer.
Chris Lively
+1  A: 

C# is just the language. The string keyword is an alias for System.String in the BCL of .Net framework. It is pretty safe to assume that internally String is an array of chars. From MSDN:

A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System.Char objects that represent a string.

Bear Monkey
A: 

It depends on what you mean by "array".

If you mean the general computing concept of a random-access, fixed-length, integer-indexable collection of objects, then yes, a string can be considered precisely like that. (The general computing concept often includes being contiguous in memory, but barring a few cases, such as using pointers in unsafe code, that's not very meaningful in terms of C#).

If you mean the language-defined C# implementation of this concept, char[] then not really, the two are different things.

In practice, System.String is indeed implemented as an array of chars, but it doesn't have to have been.

Language nit-picks aside, the practical bit:

If you want to do the same operations on a string as you would on a char[] then this will often work (notably though, string is read-only) and very often be the most efficient way of doing so, as long as conceptually quite simple. In particular, using foreach and using an index that moves between 0 and str.Length - 1 work well. Similarly, a lot of operations one can do on char[] can be done on string, such as CopyTo() and casting to IEnumerable<char>.

If you want to actually have an array of chars then you need to call ToCharArray().

Jon Hanna