ansaurus

Question

Answer 1

+12 A:

Sort it, then check two and two next to each others, as the duplicates will clump together.

Something like this:

list.Sort();
Int32 index = 0;
while (index < list.Count - 1)
{
    if (list[index] == list[index + 1])
        list.RemoveAt(index);
    else
        index++;
}

Lasse V. Karlsen 2008-09-06 19:20:36

If I am not mistaken, most of the approaches mentioned above are just abstractions of this very routines, right? I would have taken your approach here, Lasse, because its how I mentally picture moving through data. But, now I am interested in performance differences between some of the suggestions.

Ian Patrick Hughes 2009-08-11 20:52:47

Implement them and time them, only way to be sure. Even Big-O notation won't help you with actual performance metrics, only a growth effect relationship.

Lasse V. Karlsen 2009-08-12 07:03:30

Answer 2

+31 A:

Perhaps you should consider using a HashSet?

Jason Baker 2008-09-06 19:21:55

Answer 3

+2 A:

In Java (I assume C# is more or less identical):

list = new ArrayList<T>(new HashSet<T>(list))

If you really wanted to mutate the original list:

List<T> noDupes = new ArrayList<T>(new HashSet<T>(list));
list.clear();
list.addAll(noDupes);

To preserve order, simply replace HashSet with LinkedHashSet.

Tom Hawtin - tackline 2008-09-06 19:29:41

Answer 4

+2 A:

If you don't care about the order you can just shove the items into a HashSet, if you do want to maintain the order you can do something like this:

var unique = new List<T>();
var hs = new HashSet<T>();
foreach (T t in list)
    if (hs.Add(t))
        unique.Add(t);

Or the Linq way:

var hs = new HashSet<T>();
list.All( x =>  hs.Add(x) );

Edit: The HashSet method is O(N) time and O(N) space while sorting and then making unique (as suggested by @lassevk and others) is O(N*lgN) time and O(1) space so it's not so clear to me (as it was at first glance) that the sorting way is inferior (my apologies for the temporary down vote...)

Motti 2008-09-06 19:32:48

Answer 5

+16 A:

How about:-

var noDupes = list.Distinct().ToList();

In .net 3.5?

kronoz 2008-09-06 19:56:06

Answer 6

+55 A:

If you're using .Net 3+, you can use Linq.

List<T> withDupes = LoadSomeData();
List<T> noDupes = withDupes.Distinct().ToList();

Factor Mystic 2008-09-06 19:56:56

That code will fail as .Distinct() returns an IEnumerable<T>. You have to add .ToList() to it.

kronoz 2008-09-06 20:21:55

Why do you init withDupes only to overwrite the empty List a line later?

Motti 2008-09-07 20:21:22

because I'm dumb :)

Factor Mystic 2008-09-11 19:47:28

Answer 7

+8 A:

As kronoz said in .Net 3.5 you can use Distinct().

In .Net 2 you could mimic it:

public IEnumerable<T> DedupCollection<T> ( IEnumerable<T> input ) {
    HashSet<T> passedValues = new HashSet<T>();

    //relatively simple dupe check alg used as example
    foreach( T item in input)
        if( passedValues.Contains(item) )
            continue;
        else {
            passedValues.Add(item)
            yield return item;
        }
}

This could be used to dedupe any collection and will return the values in the original order.

It's normally much quicker to filter a collection (as both Distinct() and this sample does) than it would be to remove items from it.

Keith 2008-09-07 09:44:26

The problem with this approach though is that it's O(N^2)-ish, as opposed to a hashset. But at least it's evident what it is doing.

DrJokepu 2009-01-29 18:25:06

@DrJokepu - actually I didn't realise that the `HashSet` constructor deduped, which makes it better for most circumstances. However, this would preserve the sort order, which a `HashSet` doesn't.

Keith 2010-08-24 14:59:34

Answer 8

+5 A:

Simply initialize a HashSet with a List of the same type:

var noDupes = new HashSet<T>(withDupes);

Even Mien 2009-11-24 20:05:03

Answer 9

A:

An extension method might be a decent way to go... something like this:

public static List<T> Deduplicate<T>(this List<T> listToDeduplicate) { return listToDeduplicate.Distinct().ToList(); }

And then call like this, for example:

List myFilteredList = unfilteredList.Deduplicate();

Geoff Taylor 2010-04-03 13:05:02

Answer 10

A:

I tried this approach and I noticed that it works only with anonymous methods.

var res = from p in list let sub = p.Word.IsMatch(search) where sub.Count > 0 select new MyMatch() { Word= p.Word, Phrase = p.Phrase };

res = res.Distinct().ToList();

Using this syntax, duplicates are not removed. They are removed only if I do this, using anonymous methods.

var res = from p in list let sub = p.Word.IsMatch(search) where sub.Count > 0 select new { Word= p.Word, Phrase = p.Phrase };

res = res.Distinct().ToList();

Any idea of what I'm doing wrong? Thanks. Andrea

Andrea Nagar 2010-06-09 15:00:38

Answer 11

A:

thanks boys it helped me........:)

san 2010-08-24 04:48:27

This is not a forum and this is not an answer. So please delete it.

Oliver 2010-08-24 06:43:18

ansaurus

tags:

views:

answers:

Remove duplicates from a List<T> in C#

related questions