views:

501

answers:

4

Hi, my first question here :-)
Did my best reading the rules and searching if the question was already asked before.

The following code

 String[] strings = {"cAsE", "\u00df"};
 for (String str : strings) {
  System.out.println(str.equalsIgnoreCase(str.toLowerCase()));
  System.out.println(str.equalsIgnoreCase(str.toUpperCase()));
 }

outputs true 3 times (cAsE = case; cAsE = CASE; ß = ß) but also 1 false (ß != SS). Tried using toLowerCase(Locale) but it did't help.

Is this a known issue?

+3  A: 

Until recently, Unicode didn't define an uppercase version of s-sharp. I'm not sure whether the latest Java 7 version does already include this new character and whether it handles it correctly. I suggest to give it a try.

The reason why str.toLowerCase() doesn't return the same as str.toUpperCase().toLowerCase() is that Java replaces ß with SS but there is no way to go back, so SS becomes ss and the compare fails.

So if you need to level the case, you must use str.toLowerCase(). If not, then simply calling equalsIgnoreCase() without any upper/lower conversion should work, too.

Aaron Digulla
Even if Java 7 supports the new Unicode character, "ß".toUpperCase() must still return "SS", since the upper-case "ß" is only of typographical interest and not really used in the wild: http://en.wikipedia.org/wiki/Capital_ß
Joachim Sauer
In my case I'm trying to match some users' strings with predefined ones (maybe I should have mentioned it in the original question...)So the code I gave here as an example is just a test I performed to understand why my original code didn't work as expected.Obviously the equalsIgnoreCase method exists to save us from changing the case of either strings.Anyway, the concept of "leveling" is what makes this my accepted answer :-)
targumon
A: 

Hm. I don't know anything about the German language, but I'm not sure how I feel about Unicode characters being treated as equivalent to some Roman-letter expansion. Should you be able to do the following?

myDictionary.put("glasses", new Bifocals());
myDictionary.get("glaßes");

If you have your druthers, myDictionary.get("glaßes") should return something the Bifocals from before. Is that legit?

John Feminella
"ß" and "ss" is not equivalent. "ss" is sometimes used to write "ß" when that letter is not available. Since there is no upper-case "ß" (ok, there is one, but it's mostly a typographical curiosity and not a letter that's used in reality) it will always be written as "SS" in ALL CAPS. The opposite is not true: "SS".toLower() is definitely "ss".
Joachim Sauer
Ah, gotcha. Thanks for the clarification, Joachim.
John Feminella
A: 

Aaron Digulla has it. Also, it isn't meaningful to transform the string in the absence of locale data. In English, the upper case of i is I, but in Turkish it is İ. String.compareIgnoreCase does not take locale data into account.

(As an aside, you might want to look into normalization, or you'll end up wondering why "é".equals("é") can return false. Reason: one is a combining sequence.)

McDowell
targumon
+1  A: 

Unicode didn't define an uppercase version of s-sharp this is the exact point - in the german language there is no possibility of an sharp-s (ß) being a capital or the initial letter of any word. therefore its just non-sense arguing about a capital ß...

Gnark