tags:

views:

410

answers:

11

Hi, As the following code is possible in C#, I am intersted whether string is actually an array of chars:

string a="TEST";
char C=a[0]; // will be T
+11  A: 

No, it's not an array. But it does have an indexer. Best of both worlds.

Developer Art
stretchy? What does that mean?
recursive
Stretchy, like an elastic band.</tongueincheek>
BoltClock
I meant you don't have to declare it with the fixed size.
Developer Art
They do have fixed sizes though. It's not possible to resize a string, so they don't seem all that stretchy to me.
recursive
Okay, removed this confusing passage.
Developer Art
"string", strictly speaking, is syntactic sugar for a character "vector". It is an object that stores the reference to the start of a null-terminated set of characters. An instance of a string, meaning the set of characters in memory, is immutable; what looks like you changing the string (appending, capitalization, replacement, etc) actually results in a new instance of the string being created and its reference put into the string, and the old string is GCed. This is why constructs like StringBuilder, which keep the data in a more mutable state while building a string, are good practice.
KeithS
@KeithS good comment but I can't help this.. strictly speaking C# strings aren't a vector (at least in the C++ sense) nor are they null terminated.
Conrad Frix
+1  A: 

A string is not an array of chars until you convert it to one. The notation is simply used to access characters at different positions (indices) in a string.

BoltClock
+1  A: 

A string is not a char[], although it does have a .ToCharArray(). Also it does have an indexer, which allows you to access characters individually, like you've shown. It is likely that it was implemented with an array internally, but that's an implementation detail.

recursive
+3  A: 

No, String is a class in .Net. It may be backed by an array. but it is not an array. Classes can have indexers, and that is what String is doing.

See comments for elaboration on this statement: From what I understand, all strings are stored in a common blob. Because of this, "foo" and "foo" point to the same point in that blob... one of the reasons strings are immutable in C#.

Brian Genisio
You shouldn't rely on the fact that equivalent strings are the same reference though, as I don't think it's guaranteed in general.
recursive
It's called string interning.
Developer Art
Agreed. Never rely on internal implementation details. They always reserve the right to change. I added that note because the OP seemed to be interested in knowing how `String` works.
Brian Genisio
What is meant by common blob? Can you cite a source for this?
Conrad Frix
@Conrad Frix: As @Developer Art stated, it is called string interning. Here is more info: http://msdn.microsoft.com/en-us/library/system.string.intern.aspx
Brian Genisio
@Brian. My understanding was that not all strings are interned. If they were string.IsInterned() probably wouldn't exist. So perhaps your answer should read "some strings are stored in a common blob. Because of this "foo" and "foo" might point to the same point in that blob...
Conrad Frix
@Conrad Frix: According to this article, C# and VB.NET intern their strings by default, but it is not a .Net default... other languages could choose to do it differently, which is why the "interned" methods exist on string. http://csharpindepth.com/Articles/General/Strings.aspx
Brian Genisio
@Brain the article says that literals are interned. Not all strings are literals. For example StringBuilder().Append("wx").Append("yz").ToString();
Conrad Frix
@Conrad Frix: Awesome. I love SO. Thanks! :)
Brian Genisio
+1  A: 

Using Reflector, we can see that string does implement IEnumerable<char>. So, it is not a character array, but in essence can be used like one.

public sealed class String : IComparable, ICloneable, IConvertible, IComparable<string>, IEnumerable<char>, IEnumerable, IEquatable<string>

EDIT:

Implementing IEnumerable<char> does not mean that the type will be indexed. I didn't mean to convey that. It means that you can enumerate over it and use it like a collection. A better way of wording what I meant to say is that a string isn't a character array, but is a collection of characters. Thanks for the comment.

Aaron Daniels
This is somewhat inaccurate. An `IEnumerable<char>` isn't what allows the indexer to be used on strings. If you needed to access an element from any `IEnumerable<T>` you would have to use the `ElementAt` method; an indexer wouldn't be available to other object types (`T`) simply by implementing `IEnumerable<T>`.
Ahmad Mageed
+1  A: 

A string object contains a continuous block of characers, just like an array of characters, but the string object neither is, nor contains an array object.

The compiler knows that the string string is immutable, so it can do certain optimisations when you access a string, in the same manner that it does optimisations when you access an array. So, when you access a string by index, it's likely that the code ends up accessing the string data directly rather than calling an indexer property.

Guffa
+12  A: 

System.String is not a .NET array of Char because this:

char[] testArray = "test".ToCharArray();

testArray[0] = 'T';

will compile, but this:

string testString = "test";

testString[0] = 'T';

will not. Char arrays are mutable, Strings are not. Also, string is Array returns false, while char[] is Array returns true.

FacticiusVir
+4  A: 

Strings in .NET are backed by the System.String class, which internally uses a bunch of unsafe methods to do pointer manipulation on the actual string data using standard C memory manipulation techniques.

The String class itself does not contain an array, but it does have an indexer property which allows you to treat the data as if it were an array.

Scott Dorman
A: 

Everyone has given half the answer, so here is both parts:

1) Strictly speaking, yes, a String in .NET is an array of characters. It is so both in its internal implementation, and by the symantic definition of an array.

2) However String is, as others have pointed out, somewhat peculiar. It is not a System.Array as all other arrays are. So in the strict, .NET specific way, a String is not an Array.

Tergiver
+1  A: 

Strings is simply not an array, in the sense that "Hello" is char[] is evaluated to false.

tia
+1  A: 

To add a little to Scott Dorman's and Gufa's answer. If you use Windbg and !DumpObject on the string 'abcd' you'll get somthing like this.

0:000> !do 01139b24
Name: System.String
MethodTable: 79330a00
EEClass: 790ed64c
Size: 26(0x1a) bytes
 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: abcd
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
79332c4c  4000096        4         System.Int32  1 instance        5 m_arrayLength
79332c4c  4000097        8         System.Int32  1 instance        4 m_stringLength
793316e0  4000098        c          System.Char  1 instance       61 m_firstChar
79330a00  4000099       10        System.String  0   shared   static Empty
    >> Domain:Value  00181b38:01131198 <<
79331630  400009a       14        System.Char[]  0   shared   static WhitespaceChars
    >> Domain:Value  00181b38:011318b8 <<

You'll notice its only got three instance fields. m_arrayLength, m_stringLength and m_firstChar. It does not contain an instance System.Char[] The other 2 fields are static shared so every System.String has the same Empty string and WhitespaceChar Char Array.

If you follow that with a DumpByte you'll see the string data (in this case abcd) that's in the heap which of course starts at offset 0x0c (m_firstChar) and is 8 bytes wide (m_stringLength 4 x 2 for unicode).

0:000> db 01139b24 L1A

01139b24  00 0a 33 79 05 00 00 00-04 00 00 00 61 00 62 00  ..3y........a.b.
01139b34  63 00 64 00 00 00 00 00-00 00                    c.d......

If you were to look in the SSCLI you'll see that it, as Scott says, either runs unsafe code and uses pointer techniques to read the data using the m_firstChar and the m_stringLength.

Conrad Frix