views:

194

answers:

2

I've got an application that's using string.compare(string,string) to sort some values. The thing I can't figure out is why "1022" compares as less than "10-23" and "10-23" compares as less than "1024".

Is there something specific to the value of "-" that causes this result? Will that overload of string.compare give the same result with different culture settings for the same type of data (numbers with dashes)?

+3  A: 

Well, ignoring the dashes is fairly innocent. If you want to include them, perhaps use StringComparison.Ordinal in the overload.

Reading the docs for string.Compare, it uses word sort rules, which from here means :

Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list.

At least it is transitive: I logged a bug with "connect" about something very similar involving dashes - where A < B, B < C and C < A. since a non-transitive comparison essentially breaks the rules of sorting. It was closed "will not fix". Here it is:

string s1 = "-0.67:-0.33:0.33";
string s2 = "0.67:-0.33:0.33";
string s3 = "-0.67:0.33:-0.33"; 
Console.WriteLine(s1.CompareTo(s2));
Console.WriteLine(s2.CompareTo(s3));
Console.WriteLine(s1.CompareTo(s3));

(returns 1,1,-1 on my machine)

Marc Gravell
@Xenph - cheers for fixing the link
Marc Gravell
+5  A: 

From the documentation of string.Compare(String, String):

The comparison is performed using word sort rules.

and further:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

Some more details from Michael Kaplan here: A&P of Sort Keys, part 9 (aka Not always transitive, but punctual and punctuating) .

Rasmus Faber