Anyone have a quick method for de-duplicating a generic List in C#?
Sort it, then check two and two next to each others, as the duplicates will clump together.
Something like this:
list.Sort();
Int32 index = 0;
while (index < list.Count - 1)
{
if (list[index] == list[index + 1])
list.RemoveAt(index);
else
index++;
}
In Java (I assume C# is more or less identical):
list = new ArrayList<T>(new HashSet<T>(list))
If you really wanted to mutate the original list:
List<T> noDupes = new ArrayList<T>(new HashSet<T>(list));
list.clear();
list.addAll(noDupes);
To preserve order, simply replace HashSet with LinkedHashSet.
If you don't care about the order you can just shove the items into a HashSet
, if you do want to maintain the order you can do something like this:
var unique = new List<T>();
var hs = new HashSet<T>();
foreach (T t in list)
if (hs.Add(t))
unique.Add(t);
Or the Linq way:
var hs = new HashSet<T>();
list.All( x => hs.Add(x) );
Edit: The HashSet
method is O(N)
time and O(N)
space while sorting and then making unique (as suggested by @lassevk and others) is O(N*lgN)
time and O(1)
space so it's not so clear to me (as it was at first glance) that the sorting way is inferior (my apologies for the temporary down vote...)
If you're using .Net 3+, you can use Linq.
List<T> withDupes = LoadSomeData();
List<T> noDupes = withDupes.Distinct().ToList();
As kronoz said in .Net 3.5 you can use Distinct()
.
In .Net 2 you could mimic it:
public IEnumerable<T> DedupCollection<T> ( IEnumerable<T> input ) {
HashSet<T> passedValues = new HashSet<T>();
//relatively simple dupe check alg used as example
foreach( T item in input)
if( passedValues.Contains(item) )
continue;
else {
passedValues.Add(item)
yield return item;
}
}
This could be used to dedupe any collection and will return the values in the original order.
It's normally much quicker to filter a collection (as both Distinct()
and this sample does) than it would be to remove items from it.
Simply initialize a HashSet with a List of the same type:
var noDupes = new HashSet<T>(withDupes);
An extension method might be a decent way to go... something like this:
public static List<T> Deduplicate<T>(this List<T> listToDeduplicate) { return listToDeduplicate.Distinct().ToList(); }
And then call like this, for example:
List myFilteredList = unfilteredList.Deduplicate();
I tried this approach and I noticed that it works only with anonymous methods.
var res = from p in list let sub = p.Word.IsMatch(search) where sub.Count > 0 select new MyMatch() { Word= p.Word, Phrase = p.Phrase };
res = res.Distinct().ToList();
Using this syntax, duplicates are not removed. They are removed only if I do this, using anonymous methods.
var res = from p in list let sub = p.Word.IsMatch(search) where sub.Count > 0 select new { Word= p.Word, Phrase = p.Phrase };
res = res.Distinct().ToList();
Any idea of what I'm doing wrong? Thanks. Andrea