tags:

views:

10541

answers:

16

Why is it that they decided to make string immutable in Java and .NET (and some other languages)? Why didn't they make it mutable?

A: 

It's largely for security reasons. It's much harder to secure a system if you can't trust that your strings are tamperproof.

jsight
That's a good point
chrissie1
+51  A: 

There are at least two reasons.

First - security http://www.javafaq.nu/java-article1060.html

The main reason why String made immutable was security. Look at this example: We have a file open method with login check. We pass a String to this method to process authentication which is necessary before the call will be passed to OS. If String was mutable it was possible somehow to modify its content after the authentication check before OS gets request from program then it is possible to request any file. So if you have a right to open text file in user directory but then on the fly when somehow you manage to change the file name you can request to open "passwd" file or any other. Then a file can be modified and it will be possible to login directly to OS.

Second - Memory efficiency http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html

JVM internally maintains the "String Pool". To achive the memory efficiency, JVM will refer the String object from pool. It will not create the new String objects. So, whenever you create a new string literal, JVM will check in the pool whether it already exists or not. If already present in the pool, just give the reference to the same object or create the new object in the pool. There will be many references point to the same String objects, if someone changes the value, it will affect all the references. So, sun decided to make it immutable.

smink
Its a good point about reuse, and especially true if you use String.intern(). It would have been possible to reuse without making all strings immutable, but life tends to get complicated at that point.
jsight
Neither one of those seem to be terribly valid reasons to me in this day and age.
Brian Knoblauch
i'm not moved by the security reason. The authentication process can copy the value of the string to a new internal object before the check.
Laplie
I'm not too convinced by the memory efficiency argument (i.e., when two or more String objects share the same data, and one is modified, then both get modified). CString objects in MFC get around that by using reference counting.
RobH
snemarch
I gotta say, the security argument doesn't convince me at all.
André Neves
The security argument doesn't make sense. The string is only immutable in that particular language (Java). If it "was possible somehow to modify" the string, it wouldn't be in Java because I consider Java to be very secure - it would be some sort of native exploit.
wj32
The memory efficiency argument doesn't work either. In a native language like C, string constants are simply pointers to data in the initialized data section - they are read-only/immutable anyway. "if someone changes the value" - again, strings from the pool are read-only anyway.
wj32
This is *NOT THE BEST ANSWER* for the original question posed, but instead the answer below by PRINCESS_FLUFF (which has also received more "useful" selections than this one) is better fitting. It's effectively an answer direct from Effective Java, so props to Josh Bloch. It is more of a platform agnostic answer, which better fits with the original question posed. It also just makes more sense.
James
+21  A: 

Thread safety and performance. If a string cannot be modified it is safe and quick to pass a reference around among multiple threads. If strings were mutable, you would always have to copy all of the bytes of the string to a new instance, or provide synchronization. A typical application will read a string 100 times for every time that string needs to be modified. See wikipedia on immutability.

Matt Howells
+7  A: 

One factor is that, if strings were mutable, objects storing strings would have to be careful to store copies, lest their internal data change without notice. Given that strings are a fairly primitive type like numbers, it is nice when one can treat them as if they were passed by value, even if they are passed by reference (which also helps to save on memory).

Evan DiBiase
+1  A: 

It's a trade off. Strings go into the string pool and when you create multiple identical strings they share the same memory. The designers figured this memory saving technique would work well for the common case, since programs tend to grind over the same strings a lot.

The downside is that concatenations make a lot of extra strings that are only transitional and just become garbage, actually harming memory performance. You have StringBuffer and StringBuilder (in Java, StringBuilder is also in .NET) to use to preserve memory in these cases.

LeedsSideStreets
Keep in mind that the "string pool" is not automatically used for _ALL_ strings unless you explicitly use "inter()"'ed strings.
jsight
+80  A: 

According to Effective Java, chapter 4, page 73, 2nd edition:

"There are many good reasons for this: Immutable classes are easier to design, implement, and use than mutable classes. They are less prone to error and are more secure.

[...]

"Immutable objects are simple. An immutable object can be in exactly one state, the state in which it was created. If you make sure that all constructors establish class invariants, then it is guaranteed that these invariants will remain true for all time, with no effort on your part.

[...]

Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads accessing them concurrently. This is far and away the easiest approach to achieving thread safety. In fact, no thread can ever observe any effect of another thread on an immutable object. Therefore, immutable objects can be shared freely

[...]

Other small points from the same chapter:

Not only can you share immutable objects, but you can share their internals.

Immutable objects make great building blocks for other objects, whether mutable or immutable.

The only real disadvantage of immutable classes is that they require a separate object for each distinct value.

PRINCESS FLUFF
A: 

Immutability is good. See Effective Java. If you had to copy a String every time you passed it around, then that would be a lot of error-prone code. You also have confusion as to which modifications affect which references. In the same way that Integer has to be immutable to behave like int, Strings have to behave as immutable to act like primitives. In C++ passing strings by value does this without explicit mention in the source code.

Tom Hawtin - tackline
+2  A: 

String is not a primitive type, yet you normally want to use it with value semantics, ie like a value.

A value is something you can trust won't change behind your back. If you write : String str = someExpr(); You don't want it to change unless YOU do something with str.

String as an Object has naturally pointer semantics, to get value semantics as well it needs to be immutable.

+1  A: 

The decision to have string mutable in C++ causes a lot of problems, see this excellent article by Kelvin Henney about Mad COW Disease.

COW = Copy On Write.

Motti
+8  A: 

One should really ask, "why should X be mutable?" It's better to default to immutability, because of the benefits already mentioned by Princess Fluff. It should be an exception that something is mutable.

Unfortunately most of the current programming languages default to mutability, but hopefully in the future the default is more on immutablity (see A Wish List for the Next Mainstream Programming Language).

Esko Luontola
A: 

Strings in Java are not truly immutable, you can change their value's using reflection and or class loading. You should not be depending on that property for security. For examples see: Magic Trick In Java

Lorcan
I believe that you will only be able to do such tricks if your code is running with full trust, therefore there is no security loss. You could as well use JNI to write directly on the memory location where the strings are stored.
Antoine Aubry
+7  A: 

You really have no business answering technical questions related to software development if you believe that immutability has anthing to do with security.

I am reminded of my highschool C++ teacher who instructed us that private members were for for the explicit purpose of securing 'secret' data like passwords.

Strings are simply read-only by the limitations of their implementation within the VM -- they aren't hackproof, encyrpted, or actively monitored against changes for security purposes.

Quite simply, the immutablility of string stems from the tradeoffs made for sake of efficienty (string pooling) and thread saftey.

+23  A: 

Actually, the reasons string are immutable in java doesn't have much to do with security. The two main reasons are the following:

Thead Safety:

Strings are extremely widely used type of object. It is therefore more or less guaranteed to be used in a multi-threaded environment. Strings are immutable to make sure that it is safe to share strings among threads. Having an immutable strings ensures that when passing strings from thread A to another thread B, thread B cannot unexpectedly modify thread A's string.

Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.

Performance:

While String interning has been mentioned, it only represents a small gain in memory efficiency for Java programs. Only string literals are interned. This means that only the strings which are the same in your source code will share the same String Object. If your program dynamically creates string that are the same, they will be represented in different objects.

More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:

 |               myString                  |
 v                                         v
"The quick brown fox jumps over the lazy dog"   <-- shared char[]
 ^   ^
 |   |  myString.substring(0,5)

This makes this kind of operations extremely cheap, and O(1) since the operation neither depends on the length of the original string, nor on the length of the substring we need to extract. This behavior also has some memory benefits, since many strings can share their underlying char[].

LordOfThePigs
LOL! What a username!
Andrei Rinea
A: 

Wow! I Can't believe the misinformation here. Strings being immutable have nothing with security. If someone already has access to the objects in a running application (which would have to be assumed if you are trying to guard against someone 'hacking' a String in your app), they would certainly be a plenty of other opportunities available for hacking.

It's a quite novel idea that the immutability of String is addressing threading issues. Hmmm ... I have an object that is being changed by two different threads. How do I resolve this? synchronize access to the object? Naawww ... let's not let anyone change the object at all -- that'll fix all of our messy concurrency issues! In fact, let's make all objects immutable, and then we can removed the synchonized contruct from the Java language.

The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a string literal. The drawback of this optimization is that runtime code that modifies a string literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the string literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change string literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing string literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.

In Java this is known as interning. The Java compiler here is just following an standard memory optimization done by compilers for decades. And to address the same issue of these string literals being modified at runtime, Java simply makes the String class immutable (i. e, gives you no setters that would allow you to change the String content). Strings would not have to be immutable if interning of string literals did not occur.

Jim Barton
I strongly disagree about immutability and threading comment, it seems to me you're not quite getting the point there. And if Josh Bloch, one of Java implementers, says that was the one of the design issues, how can that be misinformation?
javashlook
Synchronization is expensive. References to mutable objects need to be synchronized, not so for immutable. That's a reason to make all objects immutable unless they have to be mutable. Strings can be immutable, and therefore doing that makes them more efficient in multiple threads.
David Thornley
@Jim: Memory optimization is not 'THE' reason, it's 'A' reason. Thread-safety is also 'A' reason, because immutable objects are inherently thread-safe and require no expensive synchronization, as David mentioned. Thread safety is actually a side-effect of an object being immutable. You can think of synchronization as a way to make the object "temporarily" immutable (ReaderWriterLock will make it read-only, and a regular lock will make it inaccessible altogether, which of course makes it immutable as well).
Triynko
A: 

For most purposes, a "string" is (used/treated as/thought of/assumed to be) a meaningful atomic unit, just like a number.

Asking why the individual characters of a string are not mutable is therefore like asking why the individual bits of an integer are not mutable.

You should know why. Just think about it.

I hate to say it, but unfortunately we're debating this because our language sucks, and we're trying to using a single word, string, to describe a complex, contextually situated concept or class of object.

We perform calculations and comparisons with "strings" similar to how we do with numbers. If strings (or integers) were mutable, we'd have to write special code to lock their values into immutable local forms in order to perform any kind of calculation reliably. Therefore, it is best to think of a string like a numeric identifier, but instead of being 16, 32, or 64 bits long, it could be hundreds of bits long.

When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.

Consider for a moment what it would be like if strings were mutable. The following API function could be tricked into returning information for a different user if the mutable username string is intentionally or unintentionally modified by another thread while this function is using it:

string GetPersonalInfo( string username, string password )
{
    string stored_password = DBQuery.GetPasswordFor( username );
    if (password == stored_password)
    {
        //another thread modifies the mutable 'username' string
        return DBQuery.GetPersonalInfoFor( username );
    }
}

Security isn't just about 'access control', it's also about 'safety' and 'guaranteeing correctness'. If a method can't be easily written and depended upon to perform a simple calculation or comparison reliably, then it's not safe to call it, but it would be safe to call into question the programming language itself.

Triynko
In C#, a string is mutable by its pointer (use `unsafe`) or simply through reflection (you can get the underlying field easily). This makes the point on security void, as anybody that *intentionally* wants to change a string, can do so quite easily. However, it provides security to programmers: unless you do something special, the string is guaranteed immutable (but it's not threadsafe!).
Abel
Yes, you can change the bytes of any data object (string, int, etc.) through pointers. However, we're talking about why the string class is immutable in the sense that it has no public methods built into it for modifying its characters. I was saying that a string is a lot like a number in that manipulating individual characters makes no more sense than manipulating individual bits of a number (when you treat a string as a whole token (not as a byte array), and a number as a numeric value (not as a bit field). We're talking at the conceptual object level, not at the sub-object level.
Triynko
And just to clarify, pointers in object-oriented code are inherently unsafe, exactly because they circumvent the public interfaces defined for a class. What I was saying, was that a function could be easily tricked if the public interface for a string allowed it to be modified by other threads. Of course, it can always be tricked by accessing data directly with pointers, but not as easily or unintentionally.
Triynko
@Triynko: 'pointers in object-oriented code are inherently unsafe' unless you call them *references*. References in Java are not different to pointers in C++ (only pointer arithmetic is disabled). A different concept is memory management that can be managed or manual, but that is a different thing. You could have reference semantics (pointers with no arithmetic) without having GC (the opposite would be harder in the sense that the semantics of reachability would be harder to make clean, but not unfeasable)
David Rodríguez - dribeas
The other thing is that if strings are *almost* immutable, but not quite so, (I don't know enough CLI here), that can be really bad for security reasons. In some older Java implementations you could do that, and I found a snippet of code that used that to *internalize* strings (try to locate other internal string that has the same value, share the pointer, remove the old memory block) and used the backdoor to rewrite the string contents forcing an incorrect behavior in a different class. (Consider rewriting "SELECT *" to "DELETE ")
David Rodríguez - dribeas
A: 

Immutability is not so closely tied to security. For that, at least in .NET, you get the SecureString class.

Andrei Rinea