views:

5522

answers:

4

Given the following simple example:

    List<string> list = new List<string>() { "One", "Two", "Three", "three", "Four", "Five" };

    CaseInsensitiveComparer ignoreCaseComparer = new CaseInsensitiveComparer();

    var distinctList = list.Distinct(ignoreCaseComparer as IEqualityComparer<string>).ToList();

It appears the CaseInsensitiveComparer is not actually being used to do a case-insensitive comparison.

In other words distinctList contains the same number of items as list. Instead I would expect, for example, "Three" and "three" be considered equal.

Am I missing something or is this an issue with the Distinct operator?

+1  A: 

I notice in "LINQ Pocket Reference" (O'Reilly) it says:

1.12.4. Distinct Distinct returns the input sequence stripped of duplicates. Only the default equality comparer can be used for equality comparison.

I'm wondering why then Distinct() provides an overload accepting an IEqualityComparer?

Ash
To let you to implement an equility comparer?
yapiskan
Right, but then that wouldn't be the Default Equality operator (ie using the parameterless overload), and according to the above statement can't be used.
Ash
You would have to be more specific. Perhaps they are saying that in Linq you can't supply one, but when you use the extension methods (which Linq also uses), you can.
Andrew Backer
Indeed, context is everything; if this line relates to the VB keyword, then it may be correct. If it means LINQ-to-Objects *generally* the book is wrong. And if it means LINQ-to-anything-else, then all bets are off anyway.
Marc Gravell
+21  A: 

StringComparer does what you need:

List<string> list = new List<string>() {
    "One", "Two", "Three", "three", "Four", "Five" };

var distinctList = list.Distinct(
    StringComparer.CurrentCultureIgnoreCase).ToList();

(or invariant / ordinal / etc depending on the data you are comparing)

Marc Gravell
That's great, thanks.
Ash
damn just found this on google saerching for case invariant distinct linq, awesome
Shawn Simon
+2  A: 

[See Marc Gravells answer if you want the most concise approach]

After some investigation and good feedback from Bradley Grainger I've implemented the following IEqualityComparer. It suports a case insensitive Distinct() statement (just pass an instance of this to the Distinct operator) :

class IgnoreCaseComparer : IEqualityComparer<string>
{
    public CaseInsensitiveComparer myComparer;

    public IgnoreCaseComparer()
    {
        myComparer = CaseInsensitiveComparer.DefaultInvariant;
    }

    public IgnoreCaseComparer(CultureInfo

myCulture) { myComparer = new CaseInsensitiveComparer(myCulture); }

    #region IEqualityComparer<string> Members

    public bool Equals(string x, string y)
    {
        if (myComparer.Compare(x, y) == 0)
        {
            return true;
        }
        else
        {
            return false;
        }
    }

    public int GetHashCode(string obj)
    {
        return obj.ToLower().GetHashCode();
    }

    #endregion
}
Ash
You simply don't need this. See my reply.
Marc Gravell
Yes, your reply arrived just as I was clicking "Post Your Answer".
Ash
They were certainly with <20 seconds of each other, I recall. Still, implementing something like IEqualityComparer<T> is still a useful exercise, if only for understanding how it works...
Marc Gravell
Thanks again, I'll let this this answer live then, unless anyone strongly objects.
Ash
This sample fails when initialized for the tr-TR culture if the current culture is en-US, because GetHashCode will report different values for I (U+0049) and ı (U+0131), whereas Equals will consider them equal.
Bradley Grainger
+1  A: 

Here is a far simpler version.

List list = new List() { "One", "Two", "Three", "three", "Four", "Five" };

var z = (from x in list select new { item = x.ToLower()}).Distinct();

z.Dump();