views:

6166

answers:

7

A String is a reference type even though it has most of the characteristics of a value type such as being immutable and having == overloaded to compare the text rather than making sure they reference the same object.

Why isn't string just a value type then?

+29  A: 

Strings aren't value types since they can be huge, and need to be stored on the heap. Value types are (in all implementations of the CLR as of yet) stored on the stack. Stack allocating strings would break all sorts of things: the stack is only 1MB, you'd have to box each string, incurring a copy penalty, you couldn't intern strings, and memory usage would ballon, etc...

(Edit: Added clarification about value type storage being an implementation detail, which leads to this situation where we have a type with value sematics not inheriting from System.ValueType. Thanks Ben.)

codekaizen
I understand that, but why isn't it just a value type then?
Davy8
clarified to question to reflect that
Davy8
Edit the answer and make a new one and I'll +1
Davy8
And I subsequently updated my answer. ;)
codekaizen
Nice explanation.
Byron Whitlock
I meant edit "or" make a new one btw. Waiting longer before marking accepted
Davy8
I'm nitpicking here, but only because it gives me an opportunity to link to an blog post relevant to the question: value types are not necessarily stored on the stack. It's most often true in ms.net, but not at all specified by the CLI specification. The main difference between value and reference types is, that reference types follow copy-by-value semantics. See http://blogs.msdn.com/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx and http://blogs.msdn.com/ericlippert/archive/2009/05/04/the-stack-is-an-implementation-detail-part-two.aspx
Ben Schwehn
Not to mention, strings are variable-size, so they can't be value types (as value types are stored directly wherever you declare them). When you declare a string inside a class, how could the class hold the string directly, given that one can change the string to another string of different length at any time? No, there would have to be a REFERENCE to the string because it is variable-size.
Qwertie
@Qwertie: `String` is not variable size. When you add to it, you are actually creating another `String` object, allocating new memory for it.
codekaizen
That said, a string could, in theory, have been a value type (a struct), but the "value" would have been nothing more than a reference to the string. The .NET designers naturally decided to cut out the middleman (struct handling was inefficient in .NET 1.0, and it was natural to follow Java, in which strings were already defined as a reference, rather than primitive, type. Plus, if string were a value type then converting it to object would require it to be boxed, a needless inefficiency).
Qwertie
@codekaizen: String *variables* are mutable and therefore variable-size.
Qwertie
@Qwertie: A variable doesn't have a size (except if you are talking about the size of the reference, but even if you are, it is always the same). What actually takes up the memory is the object.
codekaizen
+4  A: 

It is not a value type because performance (space and time!) would be terrible if it were a value type and its value had to be copied every time it were passed to and returned from methods, etc.

It has value semantics to keep the world sane. Can you imagine how difficult it would be to code if

string s = "hello";
string t = "hello";
bool b = (s == t);

set b to be false? Imagine how difficult coding just about any application would be.

Jason
Thats how java works, you have to use s.equals(t).
Matt Briggs
Java is not known for being pithy.
Jason
@Matt: exactly. When I switched over to C# this was kind of confusing, since I always used (an do still sometimes) .equals(..) for comparing strings while my teammates just used "==". I never understood why they didn't leave the "==" to compare the references, although if you think, 90% of the time you'll probably want to compare the content not the references for strings.
Juri
@Juri: Actually i think it's never desirable to check the references, since sometimes `new String("foo");` and another `new String("foo")` can evaluate in the same reference, which kind of is not what you would expect a `new` operator to do. (Or can you tell me a case where I would want to compare the references?)
Michael
+1  A: 

Also, the way strings are implemented (different for each platform) and when you start stitching them together. Like using a StringBuilder. It allocats a buffer for you to copy into, once you reach the end, it allocates even more memory for you, in the hopes that if you do a large concatenation performance won't be hindered.

Maybe Jon Skeet can help up out here?

Chris
+1  A: 

Actually strings have very little resembles to value types. For starters value types are not immutable, you can change the value of an Int32 all you want and it it would still be the same address on the stack.

Strings are immutable for a very good reason, it has nothing to do with it being a reference type, but has a lot to do with memory management. It's just more efficient to create a new object when string size changes than to shift things around on the managed heap. I think you're mixing together value/reference types and immutable objects concepts.

As far as "==": Like you said "==" is an operator overload, and again it was implemented for a very good reason to make framework more useful when working with strings.

WebMatrix
I realize that value types aren't by definition immutable, but most best practice seems to suggest that they should be when creating your own. I said characteristics, not properties of value types, which to me means that often value types exhibit these, but not necessarily by definition
Davy8
Good information, but I think a misinterpretation of the question
Davy8
@WebMatrix, @Davy8: The primitive types (int, double, bool, ...) are immutable.
Jason
@Jason, I thought immutable term mostly apply to objects (reference types) which can not change after initialization, like strings when strings value changes, internally a new instance of a string is created, and original object remains unchanged. How does this apply to value types?
WebMatrix
Somehow, in "int n = 4; n = 9;", it's not that your int variable is "immutable", in the sense of "constant"; it's that the value 4 is immutable, it doesn't change to 9. Your int variable "n" first has a value of 4 and then a different value, 9; but the values themselves are immutable. Frankly, to me this is very close to wtf.
Daniel Daranas
+1. I'm sick of hearing this "strings are like value types" when they quite simply aren't.
Jon Hanna
A: 

It is mainly a performance issue.

Having strings behave LIKE value type helps when writing code, but having it BE a value type would make a huge performance hit.

For an in-depth look, take a peek at a nice article on strings in the .net framework.

Denis Troller
A: 

How can you tell string is a reference type? I'm not sure that it matters how it is implemented. Strings in C# are immutable precisely so that you don't have to worry about this issue.

brone
It's a reference type (I believe) because it doesn't derives from System.ValueType From MSDN Remarks on System.ValueType:Data types are separated into value types and reference types. Value types are either stack-allocated or allocated inline in a structure. Reference types are heap-allocated.
Davy8
Both reference and value types are derived from the ultimate base class Object. In cases where it is necessary for a value type to behave like an object, a wrapper that makes the value type look like a reference object is allocated on the heap, and the value type's value is copied into it.
Davy8
The wrapper is marked so the system knows that it contains a value type. This process is known as boxing, and the reverse process is known as unboxing. Boxing and unboxing allow any type to be treated as an object. (In hind site, probably should've just linked to the article.)
Davy8
+1  A: 

Not only strings are immutable reference types. Multi-cast delegates too. That is why it is safe to write

protected void OnMyEventHandler()
{
     delegate handler = this.MyEventHandler;
     if (null != hadler )
     {
        handler(this, new EventArgs());
     }
}

I suppose that strings are immutable because this is the most safe method to work with them and allocate memory. Why they are not Value types? Previous authors are right about stack size etc. I would also add that making strings a reference types allow to save on assembly size when you use the same constant string in the program. If you define

string s1 = "my string";
//some code here
string s2 = "my string";

Chances are that both instances of "my string" constant will be allocated in your assembly only once.

If you would like to manage strings like usual reference type, put the string inside a new StringBuilder(string s). Or use MemoryStreams.

If you are to create a library, where you expect a huge strings to be passed in your functions, either define a parameter as a StringBuilder or as a Stream.

Bogdan_Ch
There are plenty of examples of immutable reference-types. And re the string example, that is indeed pretty-much guaranteed under the current implementations - *technically* it is is per *module* (not per-assembly) - but that is almost always the same thing...
Marc Gravell
Re the last point: StringBuilder doesn't help if you trying to *pass* a large string (since it is actually implemented as a string anyway) - StringBuilder is useful for **manipulating** a string multiple times.
Marc Gravell
Did u mean delegate handler, not hadler? (sorry to be picky .. but it's very close to a (not common) surname i know....)
Pure.Krome