tags:

views:

29024

answers:

11

Anyone have a quick method for de-duplicating a generic List in C#?

+12  A: 

Sort it, then check two and two next to each others, as the duplicates will clump together.

Something like this:

list.Sort();
Int32 index = 0;
while (index < list.Count - 1)
{
    if (list[index] == list[index + 1])
        list.RemoveAt(index);
    else
        index++;
}
Lasse V. Karlsen
If I am not mistaken, most of the approaches mentioned above are just abstractions of this very routines, right? I would have taken your approach here, Lasse, because its how I mentally picture moving through data. But, now I am interested in performance differences between some of the suggestions.
Ian Patrick Hughes
Implement them and time them, only way to be sure. Even Big-O notation won't help you with actual performance metrics, only a growth effect relationship.
Lasse V. Karlsen
+31  A: 

Perhaps you should consider using a HashSet?

Jason Baker
+2  A: 

In Java (I assume C# is more or less identical):

list = new ArrayList<T>(new HashSet<T>(list))

If you really wanted to mutate the original list:

List<T> noDupes = new ArrayList<T>(new HashSet<T>(list));
list.clear();
list.addAll(noDupes);

To preserve order, simply replace HashSet with LinkedHashSet.

Tom Hawtin - tackline
+2  A: 

If you don't care about the order you can just shove the items into a HashSet, if you do want to maintain the order you can do something like this:

var unique = new List<T>();
var hs = new HashSet<T>();
foreach (T t in list)
    if (hs.Add(t))
        unique.Add(t);

Or the Linq way:

var hs = new HashSet<T>();
list.All( x =>  hs.Add(x) );

Edit: The HashSet method is O(N) time and O(N) space while sorting and then making unique (as suggested by @lassevk and others) is O(N*lgN) time and O(1) space so it's not so clear to me (as it was at first glance) that the sorting way is inferior (my apologies for the temporary down vote...)

Motti
+16  A: 

How about:-

var noDupes = list.Distinct().ToList();

In .net 3.5?

kronoz
+55  A: 

If you're using .Net 3+, you can use Linq.

List<T> withDupes = LoadSomeData();
List<T> noDupes = withDupes.Distinct().ToList();
Factor Mystic
That code will fail as .Distinct() returns an IEnumerable<T>. You have to add .ToList() to it.
kronoz
Why do you init withDupes only to overwrite the empty List a line later?
Motti
because I'm dumb :)
Factor Mystic
+8  A: 

As kronoz said in .Net 3.5 you can use Distinct().

In .Net 2 you could mimic it:

public IEnumerable<T> DedupCollection<T> ( IEnumerable<T> input ) {
    HashSet<T> passedValues = new HashSet<T>();

    //relatively simple dupe check alg used as example
    foreach( T item in input)
        if( passedValues.Contains(item) )
            continue;
        else {
            passedValues.Add(item)
            yield return item;
        }
}

This could be used to dedupe any collection and will return the values in the original order.

It's normally much quicker to filter a collection (as both Distinct() and this sample does) than it would be to remove items from it.

Keith
The problem with this approach though is that it's O(N^2)-ish, as opposed to a hashset. But at least it's evident what it is doing.
DrJokepu
@DrJokepu - actually I didn't realise that the `HashSet` constructor deduped, which makes it better for most circumstances. However, this would preserve the sort order, which a `HashSet` doesn't.
Keith
+5  A: 

Simply initialize a HashSet with a List of the same type:

var noDupes = new HashSet<T>(withDupes);
Even Mien
A: 

An extension method might be a decent way to go... something like this:

public static List<T> Deduplicate<T>(this List<T> listToDeduplicate) { return listToDeduplicate.Distinct().ToList(); }

And then call like this, for example:

List myFilteredList = unfilteredList.Deduplicate();

Geoff Taylor
A: 

I tried this approach and I noticed that it works only with anonymous methods.

var res = from p in list let sub = p.Word.IsMatch(search) where sub.Count > 0 select new MyMatch() { Word= p.Word, Phrase = p.Phrase };

res = res.Distinct().ToList();

Using this syntax, duplicates are not removed. They are removed only if I do this, using anonymous methods.

var res = from p in list let sub = p.Word.IsMatch(search) where sub.Count > 0 select new { Word= p.Word, Phrase = p.Phrase };

res = res.Distinct().ToList();

Any idea of what I'm doing wrong? Thanks. Andrea

Andrea Nagar
A: 

thanks boys it helped me........:)

san
This is not a forum and this is not an answer. So please delete it.
Oliver