tags:

views:

64

answers:

4

If my collection is ordered by date will Distinct() take the first object in list of adjacent duplicates or is it not certain? I am using IEqualityComparer that does not consider date field but I want to be sure the latest date is always taken.

+2  A: 

It is not defined which it takes.

In practice it will probably take the first if you are using LINQ to objects but you shouldn't rely on it.

If you access a database it depends on the database version, query plan, etc. Then you really shouldn't be relying on it always returning the first.

If you want this guarantee you could use DistinctBy from morelinq and ask Jon Skeet to guarantee the order for you.

Mark Byers
Only LINQ-to-Objects supports IEqualityComparers.
SLaks
@SLaks: Missed that bit. I'm half tempted to remove that bit from my answer, but maybe leave it in as I believe it offers an explanation as to why Distinct doesnt offer a guarantee?
Mark Byers
+5  A: 

You should use GroupBy:

from s in whatever
group s by new { s.Field1, s.Field2 } into g
select g.OrderByDescending(o => o.Date).First()

EDIT: You can also use your IEqualityComparer with GroupBy:

whatever.GroupBy(
    s => s,      //Key
    g => g.OrderByDescending(o => o.Date).First()  //Result
    new MyComparer()
);
SLaks
instead of IEqualitycomparer?
zsharp
Yes​​​​​​​​​​​.
SLaks
+1  A: 

In the comments of this answer, Marc Gravell and I discuss the Enumerable.Distinct method. The verdict is that order is preserved, but the documentation does not guarantee that this will always work.

David B
+1  A: 

Enumerable.Distinct doesn't define which value is returned - but I can't see how it would be sensible to return anything other than the first one. Likewise although the order is undefined, it's sensible to return the items in the order in which they appear in the original sequence.

I don't normally like relying on unspecified behaviour, but I think it's extremely unlikely that this will change. It's the natural behaviour from keeping a set of what you've already returned, and yielding a result as soon as you see a new item.

If you want to rely on this unspecified behaviour, you should order the items by date (descending) before using Distinct. Alternatively, you could use a grouping and then order each group appropriately.

Jon Skeet
Aside from relying on unspecified behavior, such code would also be *misleading*. If there is a difference between objects instances such that one should be preferred over another, one should probably not use `Distinct`. To me, when you indicate that you want a distinct subset you are implying that identity semantics are irrelevant and only equality semantics matter.
LBushkin
@LBushkin: Using the version that doesn't specify an equality comparer, I agree. But if you're specifying the equality comparer, you are saying "I want elements that are distinct in this particular characteristic" - and two elements can easily be equal in that way, but distinct in others. In general I think GroupBy is a neater way of expressing this, but it could be less efficient.
Jon Skeet