ansaurus

Question

string.Empty.StartsWith(((char)10781).ToString()) always returns true?

Answer 1

+3 A:

Nice unicode glitch ;-p

I'm not sure why it does this, but amusingly:

Console.WriteLine(string.Empty.StartsWith(specialString)); // true
Console.WriteLine(string.Empty.Contains(specialString)); // false
Console.WriteLine("abc".StartsWith(specialString)); // true
Console.WriteLine("abc".Contains(specialString)); // false

I'm guessing this is treated a bit like the non-joining character that Jon mentioned at devdays; some string functions see it, and some don't. And if it doesn't see it, this becomes "does (some string) start with an empty string", which is always true.

Marc Gravell 2009-12-12 11:34:11

+1 from me. I hadn't seen Jon's talk.

RichardOD 2009-12-13 11:48:57

Answer 2

+7 A:

You can fix this bug by using ordinal StringComparison:

From the MSDN docs:

When you specify either StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, the string comparison will be non-linguistic. That is, the features that are specific to the natural language are ignored when making comparison decisions. This means the decisions are based on simple byte comparisons and ignore casing or equivalence tables that are parameterized by culture. As a result, by explicitly setting the parameter to either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, increases correctness, and becomes more reliable.

    char specialChar = (char)10781;


    string specialString = Convert.ToString(specialChar);

    // prints 1
    Console.WriteLine(specialString.Length);

    // prints 10781
    Console.WriteLine((int)specialChar);

    // prints false
    Console.WriteLine(string.Empty.StartsWith("A"));

    // prints false
    Console.WriteLine(string.Empty.StartsWith(specialString, StringComparison.Ordinal));

RichardOD 2009-12-12 11:34:27

Culture-sensitive-comparison-by-default seems like a big disastrous violation of the principle of least surprise. Is there any rule of thumb to determine which methods require a StringComparison to get ‘normal’ ordinal behaviour and which don't?

bobince 2009-12-12 14:25:23

@bobince- have you seen this question- http://stackoverflow.com/questions/72696/which-is-generally-best-to-use-stringcomparison-ordinalignorecase-or-stringcom

RichardOD 2009-12-12 15:12:54

Answer 3

+2 A:

The underlying reason for this is the default string comparison is locale aware. This means using tables of locale data for comparisons (including equality).

Many (if not most) Unicode characters have no value for many locales, and thus don't exist (or do, but match anything, or nothing).

See entries on character weights on Michael Kaplan's blog "Sorting It All Out". This series of blogs contains a lot of background information (the APIs are native, but -- as I understand -- the mechanisms in .NET are the same).

Quick version: this is a complex area to get expected (normal language) comparisons right is hard, this tends to lead to odd things with code points for glyphs outside your language.

Richard 2009-12-12 12:53:57

ansaurus

tags:

views:

answers:

string.Empty.StartsWith(((char)10781).ToString()) always returns true?

related questions