views:

2132

answers:

10
+10  Q: 

Upper vs Lower Case

When doing case-insensitive comparisons, is it more efficient to convert the string to upper case or lower case? Does it even matter?

It is suggested in this SO post that C# is more efficient with ToUpper because "Microsoft optimized it that way." But I've also read this argument that converting ToLower vs. ToUpper depends on what your strings contain more of, and that typically strings contain more lower case characters which makes ToLower more efficient.

In particular, I would like to know:

  • Is there a way to optimize ToUpper or ToLower such that one is faster than the other?
  • Is it faster to do a case-insensitive comparison between upper or lower case strings, and why?
  • Are there any programming environments (eg. C, C#, Python, whatever) where one case is clearly better than the other, and why?
A: 

If you're dealing in pure ASCII, it doesn't matter. It's just an OR x,32 vs. an AND x,32. Unicode, I have no idea...

Brian Knoblauch
This is completely wrong - OR'ing with 32 only works for A-Z and characters 64-127; it screws up all other characters. AND'ing with 32 is even more wrong - the result will always be 0 (nul) or 32 (space).
Adam Rosenfield
+3  A: 

Based on strings tending to have more lowercase entries, ToLower should theoretically be faster (lots of compares, but few assignments).

In C, or when using individually-accessible elements of each string (such as C strings or the STL's string type in C++), it's actually a byte comparison - so comparing UPPER is no different from lower.

If you were sneaky and loaded your strings into long arrays instead, you'd get a very fast comparison on the whole string because it could compare 4 bytes at a time. However, the load time might make it not worthwhile.

Why do you need to know which is faster? Unless you're doing a metric buttload of comparisons, one running a couple cycles faster is irrelevant to the speed of overall execution, and sounds like premature optimization :)

warren
To answer the question why I need to know which is faster: I don't need to know, I merely want to know. :) It's simply a case of seeing somebody make a claim (such as "comparing upper case strings is faster!") and wanting to know whether it is really true and/or why they made that claim.
Parappa
that makes sense - I'm eternally curious on stuff like this, too :)
warren
+1  A: 

It really shouldn't ever matter. With ASCII characters, it definitely doesn't matter - it's just a few comparisons and a bit flip for either direction. Unicode might be a little more complicated, since there are some characters that change case in weird ways, but there really shouldn't be any difference unless your text is full of those special characters.

Adam Rosenfield
+5  A: 

According to MSDN it is more efficient to pass in the strings and tell the comparison to ignore case:

String.Compare(strA, strB, StringComparsion.OrdinalIgnoreCase) is equivalent to (but faster than) calling

String.Compare(ToUpperInvariant(strA), ToUpperInvariant(strB), StringComparison.Ordinal).

These comparisons are still very fast.

Of course, if you are comparing one string over and over again then this may not hold.

Rob Walker
+1  A: 

If you are doing string comparison in C# it is significantly faster to use .Equals() instead of converting both strings to upper or lower case. Another big plus for using .Equals() is that more memory isn't allocated for the 2 new upper/lower case strings.

Jon Tackabury
And as a bonus, if you pick the right options it will actually give you the correct results :)
Jon Skeet
+1  A: 

Microsoft has optimized ToUpperInvariant(), not ToUpper(). The difference is that invariant is more culture friendly. If you need to do case-insensitive comparisons on strings that may vary in culture, use Invariant, otherwise the performance of invariant conversion shouldn't matter.

I can't say whether ToUpper() or ToLower() is faster though. I've never tried it since I've never had a situation where performance mattered that much.

Dan Herbert
A: 

It Depends. As stated above, plain only ASCII, its identical. In .NET, read about and use String.Compare its correct for the i18n stuff (languages cultures and unicode). If you know anything about likelyhood of the input, use the more common case.

Remember, if you are doing multiple string compares length is an excellent first discriminator.

Sanjaya R
+25  A: 

Converting to either upper case or lower case in order to do case-insensitive comparisons is incorrect due to "interesting" features of some cultures, particularly Turkey. Instead, use a StringComparer with the appropriate options.

MSDN has some great guidelines on string handling. You might also want to check that your code passes the Turkey test.

Jon Skeet
Yes StringComparer is great, but the question wasn't answered... In situations where you can't use StringComparer such as a swtich statement against a string; should I ToUpper or ToLower in the switch?
joshperry
Use a StringComparer and "if"/"else" instead of using either ToUpper or ToLower.
Jon Skeet
A: 

lower case is faster because they are smaller, duh!!

Mark Lubin
A: 

Doing it right, there should be a small, insignificant speed advantage if you convert to lower case, but this is, as many has hinted, culture dependent and is not inherit in the function but in the strings you convert (lots of lower case letters means few assignments to memory) -- converting to upper case is faster if you have a string with lots of upper case letters.

Clearer