tags:

views:

362

answers:

4

Consider this code:

String first = "abc"; 
String second = new String ("abc");

When using the new keyword, Java will create the abc String again right? Will this be stored on the regular heap or the String pool? How many Strings will end in the String pool?

+11  A: 

If you use the new keyword, a new String object will be created. Note that objects are always on the heap - the string pool is not a separate memory area that is separate from the heap.

The string pool is like a cache. If you do this:

String s = "abc";
String p = "abc";

then the Java compiler is smart enough to make just one String object, and s and p will both be referring to that same String object. If you do this:

String s = new String("abc");

then there will be one String object in the pool, the one that represents the literal "abc", and there will be a separate String object, not in the pool, that contains a copy of the content of the pooled object. Since String is immutable in Java, you're not gaining anything by doing this; calling new String("literal") never makes sense in Java and is unnecessarily inefficient.

Note that you can call intern() on a String object. This will put the String object in the pool if it is not already there, and return the reference to the pooled string. (If it was already in the pool, it just returns a reference to the object that was already there). See the API documentation for that method for more info.

See also String interning (Wikipedia).

Jesper
Presumably this is based on the Fly-weight design pattern.
Wim Hollebrandse
@Wim: Yes, this is essentially the flyweight pattern.
Jesper
the intern() pool is always on heap; I suppose that the question is asking about the **class constants pool**
dfa
beware of running out of permgen space. Using your own set to intern strings avoids this.
bmargulies
you mean by the string pool, the constant pool, right?
HH
@bmargulies: Yes, it is a bad idea to just `intern()` strings without thinking, it will create a memory leak.@HH: The class constants pool is not the same thing as the string pool.
Jesper
@Jesper: I want something to be clear. `new String("literal")` doesn't make sense as you say, but I hope you are not suggesting that calling "new String(someString)" doesn't make sense and is unnecessarily inefficient. Imagine you have `String s = "someExtremelyExtremelyLongStringLiteral";`. Then you will want to use `String sub = new String(s.substring(0,1));` instead of `String sub = s.substring(0,1);` as the latter version can waste memory. The reason is that when the reference `s` goes away, there will still be a reference `sub` pointing to the huge underlying `char[]` so it can't be GC'ed
Tom
@Jesper: I told dmindreader to look at this answer I posted on SO a while ago, and I thought you might be interested in it since you talked about `intern()`. It points out that string interning is a fragile thing to rely on as programmer and really is an implementation detail of the JVM. http://stackoverflow.com/questions/1111296/when-s-is-false-but-equals-s-is-true/1111405#1111405
Tom
@Tom: I understand, what I meant was that calling new String() *with a string literal* is never necessary.
Jesper
+3  A: 

In bytecode, the first assignment is:

  Code:
   0:   ldc     #2; //String abc
   2:   astore_1

whereas the second is:

   3:   new     #3; //class java/lang/String
   6:   dup
   7:   ldc     #2; //String abc
   9:   invokespecial   #4; //Method java/lang/String."":(Ljava/lang/String;)V

so the first is the the ppol (at position #2) whereas the second will be stored in the heap.

EDIT

Since the CONSTANT_String_info store the index as U2 (16 bits, unsigned) the pool can contain at max 2**16 = 65535 references. In the case you care here more limits of the JVM.

dfa
Looking at the bytecode is a good (and overlooked) way to find out what exactly a program is doing.
Jesper
How do you look at the bytecode? What did you use to access it?
omgzor
javap, an utility distributed in the jdk of sun http://java.sun.com/j2se/1.5.0/docs/tooldocs/windows/javap.html
dfa
+1  A: 

Each time your code create a string literal

for example: String str="Hello"; (string literal)

the JVM checks the string literal pool first. If the string already exists in the pool, a reference to the pooled instance returns. If the string does not exist in the pool, a new String object instantiates, then is placed in the pool. Java can make this optimization since strings are immutable and can be shared without fear of data corruption

loudiyimo
A: 

The only time you should use new String(foo) is when you want to break ==, which is an odd case, or when foo is a substring of a much larger string that has a limited lifetime, such as

String mystring;
{
   String source = getSomeHeinouslyLargeString();
   mystring = new String(source.substring(1,3));
}
MeBigFatGuy