tags:

views:

1321

answers:

10

The Javadoc about String.intern() doesn't give much detail. (In a nutshell: It returns a canonical representation of the string, allowing interned strings to be compared using ==)

  • When would I use this function in favor to String.equals()?
  • Are there side effects not mentioned in the Javadoc, i.e. more or less optimization by the JIT compiler?
  • Are there further uses of String.intern()?
+13  A: 

When would I use this function in favor to String.equals()

when you need speed since you can compare strings by reference (== is faster than equals)

Are there side effects not mentioned in the Javadoc?

The primary disadvantage is that you have to remember to make sure that you actually do intern() all of the strings that you're going to compare. It's easy to forget to intern() all strings and then you can get confusingly incorrect results. Also, for everyone's sake, please be sure to very clearly document that you're relying on the strings being internalized.

The second disadvantage if you decide to internalize strings is that the intern() method is relatively expensive. It has to manage the pool of unique strings so it does a fair bit of work (even if the string has already been internalized). So, be careful in your code design so that you e.g., intern() all appropriate strings on input so you don't have to worry about it anymore.

(from JGuru)

EDIT

As Michael Borgwardt said: Third disadvantage: interned strings can't be garbage collected, so it's a potential for a memory leak.

dfa
A third disadvantage: interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.
Michael Borgwardt
AFAIK newer VMs also garbage collect the PermGen space.
DR
Yes, but as long as those Strings are not yet eligible for GC, they're taking up scant PermGen space.
Michael Borgwardt
Hey, please explain that the "intern" has to calculate the hash code of the string for the first time so it has to iterate over all the string which is more expensive than simply comparing (comparing can end before all the string is iterated). So **intern only makes sense to compare if you compare many times the same strings**. Aditionally it doesn't seem to be a good programming practice because probably there are nicer ways to improve the performance than degrading the code this way (for me to abandone an abstract solution like *equals* is degrading).
helios
Intern is about memory management, not comparison speed. The difference between `if (s1.equals(s2))` and `if (i1 == i2)` is minimal unless your have a lot of long strings with the same leading characters. In most real-world uses (other than URLs) the strings will differ within the first few characters. And long if-else chains are a code smell anyway: use enums and functor maps.
kdgregory
+5  A: 

Am not aware of any advantages, and if there were in one would think that equals() would itself use intern() internally (which it doesn't).

Busting intern() myths

objects
Despite you saying that you're not aware of any advantages, your posted linked identifies comparison via == as being 5x faster and thus important for text-centric performant code
Brian Agnew
When you have lots of text-comparing to do you’ll eventually run out of PermGen space. When there is not so much text-comparing to do the speed difference doesn’t matter. Either way, just don’t intern() your strings. It’s not worth it.
Bombe
It also goes on to say that the overall relative gain is typically going to be small.
objects
+1 for the link
Totophil
I don't think that kind of logic is valid. Good link though!
DR
@DR: what logic? That's one big fallacy. @objects: sorry but your arguments fall short of reasons. There are *very* good reasons to use `intern`, and very good reasons that `equals` doesn't do so by default. The link you posted is complete bollocks. The last paragraph even admits that `intern` has a valid usage scenario: heavy text processing (e.g. a parser). Concluding that “[XYZ] is dangerous if you don't know what you are doing” is so banal that it physically hurts.
Konrad Rudolph
@Bombe: often in text processing you've got a fixed list of strings that you need to intern (e.g. a list of keywords) and there'll be no danger of running out of PermGen space.
Konrad Rudolph
+8  A: 

This has (almost) nothing to do with string comparison. String interning is intended for saving memory if you have many strings with the same content in you application. By using String.intern() the application will only have one instance in the long run and a side effect is that you can perform fast reference equality comparison instead of ordinary string comparison (but this is usually not advisable because it is realy easy to break by forgetting to intern only a single instance).

Daniel Brückner
how compare interned string with ruby's symbols?
dfa
+1  A: 

I would examine intern and ==-comparison instead of equals only in the case of equals-comparison being bottleneck in multiple comparisons of string. This is highly unlikely to help with small number of comparisons, because intern() is not free. After aggressively interning strings you will find calls to intern() getting slower and slower.

Mikko Maunu
+2  A: 

A commonly overlooked disadvantage with string interning is that it adds the string object to a "static pool" of strings in non-heap memory (at least, it seems to do that in the Sun VM). Once in there, they don't get garbage collected.

If your application doesn't deal with an arbitrary number of strings (as it generally would during input data processing), then interning won't cause a problem. However, if you intern every string that comes through the door, then you'll bust your non-heap memory pool. In other words, you get a memory leak.

skaffman
This appears to be a myth: http://blog.kdgregory.com/2009/09/intern-isnt-forever-and-maybe-never-was.html
kdgregory
+2  A: 

When would I use this function in favor to String.equals()

Given they do different things, probably never.

Interning strings for performance reasons so that you can compare them for reference equality is only going to be of benefit if you are holding references to the strings for a while - strings coming from user input or IO won't be interned.

That means in your application you receive input from an external source and process it into an object which has a semantic value - an identifier say - but that object has a type indistinguishable from the raw data, and has different rules as to how the programmer should use it.

It's almost always better to create a UserId type which is interned ( it's easy to create a thread-safe generic interning mechanism ) and acts like an open enum, than to overload the java.lang.String type with reference semantics if it happens to be a User ID.

That way you don't get confusion between whether or not a particular String has been interned, and you can encapsulate any additional behaviour you require in the open enum.

Pete Kirkham
A: 

I would vote for it not being worth the maintenance hassle.

Most of the time, there will be no need, and no performance benefit, unless you're code does a lot of work with substrings. In which case the String class will use the original string plus an offset to save memory. If your code uses substrings a lot, then I suspect that it'll just cause your memory requirements to explode.

wm_eddie
+3  A: 

String.Intern() is definitely garbage collected in modern JVMs.
The following NEVER runs out of memory, because of GC activity:

Java -cp . -Xmx128m UserOfIntern

public class UserOfIntern {
public static void main(String[] args) {
    Random random = new Random();
    System.out.println(random.nextLong());
    while (true) {
        String s = String.valueOf(random.nextLong());
        s = s.intern();
    }
}
}

See more on the myth of non GCed String.intern() allocations here.

Gili Nachum
`OutOfMemoryException` - no, not the code above, in my *brain*: link to the javaturning article, which is pointing to this article, which is pointing to the javaturning article, which... :-)
Carlos Heuberger
A: 

The real reason to use intern is not the above. You get to use it after you get out-of-memory error. Lots of the string in a typical program are String.substring() of other big string [think of taking out a user-name from a 100K xml file. The java implementation is that , the substring holds a reference to the original string and the start+end in that huge string. (The thought behind it is a reuse of the same big string)

After 1000 big files , from which you only save 1000 short names , you will keep in memory the whole 1000 files! Solution: in this scenario just use smallsubstring.intern()

asaf
A: 

I am using intern to save memory, I hold a large amount of String data in memory and moving to use intern() saved a massive amount of memory. Unfortunately although it use alot less memory the memory it does use is stored in PermGen memory not Heap and it is difficult to explain to customers how to increase the allocation of this type of memory.

So is there an alternative to intern() for reducing memory consumption, (the == versus equals performance benefits is not a aissue for me)