EDIT: In the answer below I've referred to the intern pool as being AppDomain-specific; I'm pretty sure that's what I've observed before, but the MSDN docs for String.Intern suggest that there's a single intern pool for the whole process, making this even more important.
Original answer
(I was going to add this as a comment, but I think it's an important enough point to need an extra answer...)
As others have explained, string interning occurs for all string literals, but not on "dynamically created" strings (e.g. those read from a database or file, or built using StringBuilder
or String.Format
.)
However, I wouldn't suggest calling String.Intern
to get round the latter point: it will populate the intern pool for the lifetime of your AppDomain
. Instead, use a pool which is local to just your usage. Here's an example of such a pool:
public class StringPool
{
private readonly Dictionary<string,string> contents =
new Dictionary<string,string>();
public string Add(string item)
{
string ret;
if (!contents.TryGetValue(item, out ret))
{
contents[item] = item;
ret = item;
}
return ret;
}
}
You'd then just use something like:
string data = pool.Add(ReadItemFromDatabase());
(Note that the pool isn't thread-safe; normal usage wouldn't need it to be.)
This way you can throw away your pool as soon as you no longer need it, rather than having a potentially large number of strings in memory forever. You could also make it smarter, implementing an LRU cache or something if you really wanted to.
EDIT: Just to clarify why this is better than using String.Intern
... suppose you read a bunch of strings from a database or log file, process them, and then move onto another task. If you call String.Intern
on those strings, they will never be garbage collected as long as your AppDomain
is alive. If you load several different log files, you'll gradually accumulate strings in your intern pool until you either finish or run out of memory. Instead, I'm suggesting a pattern like this:
void ProcessLogFile(string file)
{
StringPool pool = new StringPool();
// Process the log file using strings in the pool
} // The pool can now be garbage collected
Here you get the benefit of multiple strings in the same file only existing once in memory (or at least, only getting past gen0 once) but you don't pollute a "global" resource (the intern pool).