views:

2283

answers:

3

I've just started learning linq and lamda expressions, and they seem to be a good fit for finding duplicates in a complex object collection, but I'm getting a little confused and hope someone can help put me back on the path to happy coding.

My object is structured like list.list.uniqueCustomerIdentifier

I need to ensure there are no duplicate uniqueCustomerIdentifier with in the entire complex object. If there are duplicates, I need to identify which are duplicated and return a list of the duplicates.

+2  A: 

There is a linq operator Distinct( ), that allows you to filter down to a distinct set of records if you only want the ids. If you have setup your class to override equals you or have an IEqualityComparer you can directly call the Distinct extension method to return the unique results from the list. As an added bonus you can also use the Union and Intersect methods to merge or filter between two lists.

Another option would be to group by the id and then select the first element.

var results = from item in list
              group item by item.id into g
              select g.First();
smaclell
A: 

If you want to flatten the two list hierarchies, use the SelectMany method to flatten an IEnumerable<IEnumerable<T>> into IEnumerable<T>.

Omer van Kloeten
+5  A: 
  • Unpack the hierarchy
  • Project each element to its uniqueID property
  • Group these ID's up
  • Filter the groups by groups that have more than 1 element
  • Project each group to the group's key (back to uniqueID)
  • Enumerate the query and store the result in a list.


var result = 
  myList
    .SelectMany(x => x.InnerList)
    .Select(y => y.uniqueCustomerIdentifier)
    .GroupBy(id => id)
    .Where(g => g.Skip(1).Any())
    .Select(g => g.Key)
    .ToList()
David B
You could skip .Select(y => y.uniqueCustomerIdentifier) and use .GroupBy(y => y.uniqueCustomerIdentifier) instead.
Lucas