views:

359

answers:

4

Does String.ToLower() return the same reference (e.g. without allocating any new memory) if all the characters are already lower-case?

Memory allocation is cheap, but running a quick check on zillions of short strings is even cheaper. Most of the time the input I'm working with is already lower-case, but I want to make it that way if it isn't.

I'm working with C# / .NET in particular, but my curiosity extends to other languages so feel free to answer for your favorite one!

NOTE: Strings are immutable but that does not mean a function always has to return a new one, rather it means nothing can change their character content.

+4  A: 

String is an immutable. String.ToLower() will always return new instance thereby generating another instance on every ToLower() call.

cgreeno
However due to string interning, it might not actually be a unique string instance if there is already a string that matches that one in memory, right?
Jamie Penney
It's possible that the algorithm could check to see if it was already lower case, but a quick look with Reflector makes me think you're right.
Larsenal
@Jamie - most functions don't check for an interned copy - only very specific ones (or if you call it yourself)
Marc Gravell
+7  A: 

I expect so, yes. A quick test agrees (but this is not evidence):

string a = "abc", b = a.ToLower();
bool areSame = ReferenceEquals(a, b); // false

In general, try to work with comparers that do what you want. For example, if you want a case-insensitive dictionary, use one:

var lookup = new Dictionary<string, int>(
    StringComparer.InvariantCultureIgnoreCase);

Likewise:

bool ciEqual = string.Equals("abc", "ABC",
    StringComparison.InvariantCultureIgnoreCase);
Marc Gravell
Thanks. I agree with your recommendation, but in this case I'm storing a normalized version of some always-ASCII data (not culture related) in addition to comparing it.
Neil C. Obremski
+1 - never knew you can create a dictionary like that. I will definitely use that one!
BFree
@Neil - if you need to store it, then yes: you'll need ToLower() - unless of course you do the normalization on the inbound char[] buffer (probably too much effort).
Marc Gravell
+1  A: 

Java implementation of String.toLowerCase() from Sun actually doesn't always allocate new String. It checks if all chars are lowercase, and if so, it returns original string.

Peter Štibraný
+1  A: 

[edit]
Interning doesn't help -- see the comments to this answer.

Joel Coehoorn
Can you qualify that with an example? If I manually intern it still duplicates: string a = string.Intern("abc"); bool isSame = ReferenceEquals(a, a.ToLower());
Marc Gravell
Try calling that several times in a loop.
Joel Coehoorn
And? What am I meant to see?
Marc Gravell
I would expect it to return the interned string. If this doesn't happen I need to go back and update my understanding of interning :(
Joel Coehoorn
I can't get it to return an interned version, even though `a` itself is interned (and equal to the ToString() result).
Marc Gravell
Bummer. I can't mess with it here at work, as .Net 2.0 doesn't have string interning, but I'll try to remember to play with this at home some tonight. Maybe if I call a.ToLower().Intern(); ...
Joel Coehoorn
I'm thinking about deleting this, but I think the discussion is valuable enough to be worth leaving up, even if it is a bit embarrassing :o
Joel Coehoorn
The main intern consumer is with compiled literals; other than that, you have to call `IsInterned` yourself - and few methods bother (since it takes time to check). Useful for loading grids with alike cells, but you still get a big gen0 hit until the next GC (from the temp/throwaway versions).
Marc Gravell
I agree: valuable discussion - nothing to be embarrassed about at all.
Marc Gravell
Thanks guys, I found this discussion useful myself.
Neil C. Obremski