tags:

views:

250

answers:

5

Using c# 3 and .Net Framework 3.5, I have a Person object

public Person
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int SSN { get; set; }
}

and I've got a List of them:

List<Person> persons = GetPersons();

How can I get all the Person objects in persons where SSN is not unique in the list and remove them from the persons list and ideally add them to another list called "List<Person> dupes"?

The original list might look something like this:

persons = new List<Person>();
persons.Add(new Person { Id = 1, 
                         FirstName = "Chris", 
                         LastName="Columbus", 
                         SSN=111223333 }); // Is a dupe
persons.Add(new Person { Id = 1, 
                         FirstName = "E.E.", 
                         LastName="Cummings", 
                         SSN=987654321 });
persons.Add(new Person { Id = 1, 
                         FirstName = "John", 
                         LastName="Steinbeck", 
                         SSN=111223333 }); // Is a dupe
persons.Add(new Person { Id = 1, 
                         FirstName = "Yogi", 
                         LastName="Berra", 
                         SSN=123456789 });

And the end result would have Cummings and Berra in the original persons list and would have Columbus and Steinbeck in a list called dupes.

Many thanks!

A: 

well if you implement IComparable like so:

int IComparable<Person>.CompareTo(Person person)
{
    return this.SSN.CompareTo(person.SSN);
}

then a comparison like the following will work:

for (Int32 i = 0; i < people.Count; i++)
{
    for (Int32 j = 1; j < items.Count; j++)
    {
        if (i != j && items[i] == items[j])
        {
            // duplicate
        }
    }
}
SnOrfus
+6  A: 

This gets you the duplicated SSN:

var duplicatedSSN =
    from p in persons
    group p by p.SSN into g
    where g.Count() > 1
    select g.Key;

The duplicated list would be like:

duplicated = persons.FindAll(p => duplicatedSSN.Contains(p.SSN));

And then just iterate over the duplicates and remove them.

duplicated.ForEach(dup => persons.Remove(dup);
gcores
Your solution was close. The line `duplicated = persons.FindAll(duplicatedSSN.Contains(p => p.SSN);` did not work. See my answer to see what I corrected to get to the answer.
Chris Conway
A: 

Traverse the list and keep a Hashtable of SSN/count pairs. Then enumerate your table and remove the items that match SSNs where SSN count > 0.

Dictionary<string, int> ssnTable = new Dictionary<string, int>();

foreach (Person person in persons)
{
   try
   {
      int count = ssnTable[person.SSN];
      count++;
      ssnTable[person.SSN] = count;
   }
   catch(Exception ex)
   {
       ssnTable.Add(person.SSN, 1);
   }
}

// traverse ssnTable here and remove items where value of entry (item count) > 1
mjmarsh
A: 
List<Person> actualPersons = persons.Distinct().ToList();
List<Person> duplicatePersons = persons.Except(actualPersons).ToList();
Graeme Bradbury
This did not work since Distinct looks at all of the data. I just want to compare SSN and look for dupes on that one field.
Chris Conway
A: 

Thanks to gcores for getting me started down a correct path. Here's what I ended up doing:

var duplicatedSSN =
    from p in persons
    group p by p.SSN into g
    where g.Count() > 1
    select g.Key;

var duplicates = new List<Person>();

foreach (var dupeSSN in duplicatedSSN)
{
    foreach (var person in persons.FindAll(p => p.SSN == dupeSSN))
        duplicates.Add(person);
}

duplicates.ForEach(dup => persons.Remove(dup));
Chris Conway
Sorry, the line was wrong. It should have said duplicated = persons.FindAll(p => duplicatedSSN.Contains(p.SSN));. I've edited the answer.
gcores