views:

269

answers:

5

I always assumed that .Net compares strings lexicographically, according to the current culture. But there is something strange when one of the strings ends on '-':

"+".CompareTo("-")
Returns: 1

"+1".CompareTo("-1")
Returns: -1

I get it an all cultures I tried, including the invariant one. Can anyone explain what is going on, and how can I get the consistent character-by-character ordering for the current locale?

+8  A: 

Try changing this to

string.Compare("+", "-", StringComparison.Ordinal); // == -2
string.Compare("+1", "-1", StringComparison.Ordinal); // == -2
Anton Gogolev
Thank you Anton, but Ordinal means the old good ASCII sort, with all uppercase going before lowercase. string.Compare("a", "Z", StringComparison.Ordinal);7In my locale, the case-sensitive ordering is something like'a'<'A'<'ä'<'Ä'<...<'z'<'Z', but I see now way to access it directly.
J-mster
jmster: Why not write your own sort? Are you just concerned about numbers?
Noon Silk
I have written my own method, but in the bottom I need some way to compare the characters according to the local rules. And I do not want to code the locales by hand, it should be possible to get it from Windows
J-mster
+6  A: 

There isn't necessarily a consistent character-by-character ordering for any particular locale.

From the MSDN documentation:

For example, a culture could specify that certain combinations of characters be treated as a single character, or uppercase and lowercase characters be compared in a particular way, or that the sorting order of a character depends on the characters that precede or follow it.

The only way to ensure consistent character-by-character ordering is by using an ordinal comparison, as demonstrated in Anton's answer.

LukeH
This explains the matter. The only thing I wonder is why on earth they need to introduce such magic treatment for all-purpose characters like ASCII minus?
J-mster
@jmster: My guess is that the `+` and `-` characters don't really mean anything in isolation, so in that situation an ordinal comparison takes place where `+` (code 43) evaluates to "less than" `-` (code 45). However, when the `+` and `-` characters prefix a number then they take on a meaning, and in that situation a "semantic" and/or numeric comparison takes place where `+1` is greater than `-1`.
LukeH
@Luke: Actually it's the quite opposite, .NET uses ASCII ordering to conclude that "+1" is less than "-1", but in the final position minus magically turns into hyphen, so that "-" is sorted somewhere between "\a" (bell) and " " (blank space).
J-mster
@jmster: Yes, you're right. My guess was wrong. I think the excerpt from the documentation still applies, but I've no idea what particular logic applies to the different comparisons in this case.
LukeH
+3  A: 
        string.Compare("+", "-");
        string.Compare("+", "-", StringComparison.CurrentCulture);
        string.Compare("+", "-", StringComparison.InvariantCulture);
        string.Compare("+", "-", StringComparison.InvariantCultureIgnoreCase);

        // All Pass

the two value are equal because, inguisitic casing is being taken into consideration

FIX:

replace the invariant comparison with an ordinal comparison.This means the decisions are based on simple byte comparisons and ignore casing or equivalence tables that are parameterized by culture.

reference : Use ordinal StringComparison

string.Compare("+", "-", StringComparison.Ordinal); // fail

Asad Butt
A: 

use CompareOrdinal. e.g

String.CompareOrdinal("+1","-1");
-2
String.CompareOrdinal("+","-");
-2
nos
+2  A: 

You'll probably want to use the true minus sign, Unicode codepoint \u2212. The minus sign you use in programming (\u002d) is a "hyphen-minus", its collation order is context sensitive because it is also frequently used as a hyphen. There's more than you'll want to know about the many different kind of dashes in this article.

Hans Passant
Your observation is absolutely correct, the Unicode people seem to know better how each character has to be used. It's users' fault that they don't remember Unicode numbers and use the characters they see on the keyboard.
J-mster
Typography and programming languages are still a world apart. Maybe that will change some day, it will take a while. There are plenty of Alt+keypad shortcuts available to enter these kind of Unicode codepoints, ironically not for the minus sign.
Hans Passant