ansaurus

Question

Answer 1

+1 A:

Sort the data, iterate through it and remember the last item. When the current item is the same as the last, its a duplicate. This can be easily implemented either iteratively or using a lambda expression in O(n*log(n)) time.

inflagranti 2010-06-29 07:29:05

This is what I am trying to do, but learning how to write lambdas and it's not obvious to how to do it.

2010-06-29 07:41:30

Answer 2

+10 A:

When runtime is no problem, you could use

var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();

Good old O(n^n) =)

Edit: Now for a better solution. =) If you define a new extension method like

static class Extensions
{        

    public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
    {
        HashSet<T> hash = new HashSet<T>();
        foreach (T item in input)
        {
            if (!hash.Contains(item))
            {
                hash.Add(item);
            }
            else
            {
                yield return item;
            }
        }
    }
}

you can use

var duplicates = data.Duplicates().Distinct().ToArray();

Jens 2010-06-29 07:30:31

Good solution with the hash set. I was thinking along those lines, but clearly haven't woken up yet...

Noldorin 2010-06-29 07:37:51

Good solution with the hash set indeed! I knew I could do it that way, but didn't know it was possible to extend the language like that!

2010-06-29 07:43:35

Answer 3

+5 A:

Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.

from g in (from x in data group x by x)
where g.Count() > 1 
select g.Key;

--OR if you prefer extension methods

data.GroupBy(x => x)
    .Where(x => x.Count() > 1)
    .Select(x => x.Key)

Where Count() == 1 that's your distinct items and where Count() > 1 that's one or more duplicate items.

Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:

var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1 
select x.Key;
// distinct
from x in g
where x.Count() == 1 
select x.Key;

When creating the grouping a set of sets will be created. Assuming that it's a set with O(1) insertion the running time of the group by approach is O(n). The incurred cost for each operation is somewhat high, but it should equate to near linear performance.

John Leidegren 2010-06-29 07:40:39

I'd vote you up but require 15 rep in order to do that(!) Good example :)

2010-06-29 07:44:59

You can change your prefered answer if you want to.

John Leidegren 2010-06-29 07:46:57

ansaurus

tags:

views:

answers:

Lambda expression to find difference

related questions