Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.
from g in (from x in data group x by x)
where g.Count() > 1
select g.Key;
--OR if you prefer extension methods
data.GroupBy(x => x)
.Where(x => x.Count() > 1)
.Select(x => x.Key)
Where Count() == 1
that's your distinct items and where Count() > 1
that's one or more duplicate items.
Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:
var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1
select x.Key;
// distinct
from x in g
where x.Count() == 1
select x.Key;
When creating the grouping a set of sets will be created. Assuming that it's a set with O(1)
insertion the running time of the group by approach is O(n)
. The incurred cost for each operation is somewhat high, but it should equate to near linear performance.