ansaurus

Question

Java's equalsIgnoreCase fails with ß ("Sharp S" used in German alphabet)

Answer 1

+3 A:

Until recently, Unicode didn't define an uppercase version of s-sharp. I'm not sure whether the latest Java 7 version does already include this new character and whether it handles it correctly. I suggest to give it a try.

The reason why str.toLowerCase() doesn't return the same as str.toUpperCase().toLowerCase() is that Java replaces ß with SS but there is no way to go back, so SS becomes ss and the compare fails.

So if you need to level the case, you must use str.toLowerCase(). If not, then simply calling equalsIgnoreCase() without any upper/lower conversion should work, too.

Aaron Digulla 2009-08-26 11:25:58

Even if Java 7 supports the new Unicode character, "ß".toUpperCase() must still return "SS", since the upper-case "ß" is only of typographical interest and not really used in the wild: http://en.wikipedia.org/wiki/Capital_ß

Joachim Sauer 2009-08-26 11:39:01

In my case I'm trying to match some users' strings with predefined ones (maybe I should have mentioned it in the original question...)So the code I gave here as an example is just a test I performed to understand why my original code didn't work as expected.Obviously the equalsIgnoreCase method exists to save us from changing the case of either strings.Anyway, the concept of "leveling" is what makes this my accepted answer :-)

targumon 2009-08-26 12:38:36

Answer 2

A:

Hm. I don't know anything about the German language, but I'm not sure how I feel about Unicode characters being treated as equivalent to some Roman-letter expansion. Should you be able to do the following?

myDictionary.put("glasses", new Bifocals());
myDictionary.get("glaßes");

If you have your druthers, myDictionary.get("glaßes") should return something the Bifocals from before. Is that legit?

John Feminella 2009-08-26 11:26:18

"ß" and "ss" is not equivalent. "ss" is sometimes used to write "ß" when that letter is not available. Since there is no upper-case "ß" (ok, there is one, but it's mostly a typographical curiosity and not a letter that's used in reality) it will always be written as "SS" in ALL CAPS. The opposite is not true: "SS".toLower() is definitely "ss".

Joachim Sauer 2009-08-26 11:37:38

Ah, gotcha. Thanks for the clarification, Joachim.

John Feminella 2009-08-26 11:44:11

Answer 3

A:

Aaron Digulla has it. Also, it isn't meaningful to transform the string in the absence of locale data. In English, the upper case of i is I, but in Turkish it is İ. String.compareIgnoreCase does not take locale data into account.

(As an aside, you might want to look into normalization, or you'll end up wondering why "é".equals("é") can return false. Reason: one is a combining sequence.)

McDowell 2009-08-26 11:42:08

targumon 2009-08-26 12:22:52

Answer 4

+1 A:

Unicode didn't define an uppercase version of s-sharp this is the exact point - in the german language there is no possibility of an sharp-s (ß) being a capital or the initial letter of any word. therefore its just non-sense arguing about a capital ß...

Gnark 2009-08-26 12:24:52

ansaurus

tags:

views:

answers:

Java's equalsIgnoreCase fails with ß ("Sharp S" used in German alphabet)

related questions