views:

620

answers:

6

I'm wondering what the correct way to compare two characters ignoring case that will work for all cultures. Also, is Comparer<char>.Default the best way to test two characters without ignoring case? Does this work for surrogate-pairs?

EDIT: Added sample IComparer<char> implementation

If this helps anyone this is what I've decided to use

public class CaseInsensitiveCharComparer : IComparer<char> {
    private readonly System.Globalization.CultureInfo ci;
    public CaseInsensitiveCharComparer(System.Globalization.CultureInfo ci) {
        this.ci = ci;
    }
    public CaseInsensitiveCharComparer()
        : this(System.Globalization.CultureInfo.CurrentCulture) { }
    public int Compare(char x, char y) {
        return Char.ToUpper(x, ci) - Char.ToUpper(y, ci);
    }
}

// Prints 3
Console.WriteLine("This is a test".CountChars('t', new CaseInsensitiveCharComparer()));
+2  A: 

string.Compare("string a","STRING A",true)

It will work for every string

Sergio
Hi Sergio, I'm after a way to compare two char instances, not string instances. I'm looking for a Comparer<char> implementation that ignores case.
Brett Ryan
This works great in English speaking countries. However, nobody in eastern Europe will ever use an application you write.
Jon Grant
@Jon Grant: I use this at my country (Portugal), Portuguese is a Latin based language that has lots of "weird" characters like: ã é à ç, it works perfectly for me.
Sergio
+12  A: 

It depends on what you mean by "work for all cultures". Would you want "i" and "I" to be equal even in Turkey?

You could use:

bool equal = char.ToUpperInvariant(x) == char.ToUpperInvariant(y);

... but I'm not sure whether that "works" according to all cultures by your understanding of "works".

Of course you could convert both characters to strings and then perform whatever comparison you want on the strings. Somewhat less efficient, but it does give you all the range of comparisons available in the framework:

bool equal = x.ToString().Equals(y.ToString(), 
                                 StringComparison.InvariantCultureIgnoreCase);
Jon Skeet
That's the way I had thought of doing it in both of your examples but thought there might have been a better way that I had not have known existed that the framework provides. I was thinking in the context of the LINQ extension method for String.Contains(char, IEqualityComparer<char>)
Brett Ryan
There's no framework method for this: string comparison is actually implemented using native methods, not by dropping down to a Comparer<char> implementation.
Julian Birch
+2  A: 

As I understand it, there isn't really a way that will "work for all cultures". Either you want to compare characters for some kind of internal, non-displayed-to-the-user reason (in which case you should use the InvariantCulture), or you want to use the CurrentCulture of the user. Obviously, using the user's current culture will mean that you will get different results in different locales, but they will be consistent with what your users in those locales will expect.

Without knowing more about WHY you are comparing two characters, I can't really advise you on which one you should be using.

Jon Grant
Thanks Jon, it's a general question, I'm not versed well with unicode and thought I'd pose the question here. Consider the String.Contains(char, IEqualityComparer<char>) extension method that LINQ provides, what would be the correct way to implement that being case-insensitive?
Brett Ryan
Again, it would really depend on what the data was and why you were comparing it. It you just wanted to sort things into some consistent order for example, using any of the various Invariant comparisons would be fine. If you're responding to user input, you probably want to use the culture of that user to give them results they would expect. I'm not sure there is really a "one size fits all" answer.
Jon Grant
Do you think my Comparer implementation provided as an answer would be a correct approach?
Brett Ryan
A: 

You could try:

    class Test{
    static int Compare(char t, char p){
        return string.Compare(t.ToString(), p.ToString(), StringComparison.CurrentCultureIgnoreCase);
    }
}

But I doubt this is the "optimal" way to do it, but I'm not all of the cases you need to be checking...

Hawker
A: 

What I was thinking that would be available within the runtime is something like the following

public class CaseInsensitiveCharComparer : IComparer<char> {
    private readonly System.Globalization.CultureInfo ci;
    public CaseInsensitiveCharComparer(System.Globalization.CultureInfo ci) {
        this.ci = ci;
    }
    public CaseInsensitiveCharComparer()
        : this(System.Globalization.CultureInfo.CurrentCulture) { }
    public int Compare(char x, char y) {
        return Char.ToUpper(x, ci) - Char.ToUpper(y, ci);
    }
}

// Prints 3
Console.WriteLine("This is a test".CountChars('t', new CaseInsensitiveCharComparer()));
Brett Ryan
It's dangerous to assume that the char comparison by subtraction will continue to be correct in future CLR versions, so I would use `return Char.ToUpper(x, ci).CompareTo(Char.ToUpper(y, ci));` instead.
Matt Howells
A: 

I would recommend comparing uppercase, and if they don't match then comparing lowercase, just in case the locale's uppercasing and lowercasing logic behave slightly different.

Loadmaster