ansaurus

Question

Removing duplicates from a list with "priority"

Answer 1

+4 A:

How about you split your original list into 3 - ones with all data, ones with ID1, and ones with just ID2.

Then do:

var unique = allData.Concat(id1Data.Except(allData))
                    .Concat(id2Data.Except(id1Data).Except(allData));

having defined equality just on the basis of ID2.

I suspect there are more efficient ways of expressing that, but the fundamental idea is sound as far as I can tell. Splitting the initial list into three is simply a matter of using GroupBy (and then calling ToList on each group to avoid repeated queries).

EDIT: Potentially nicer idea: split the data up as before, then do:

var result = new HashSet<...>(allData);
result.UnionWith(id1Data);
result.UnionWith(id2Data);

I believe that UnionWith keeps the existing elements rather than overwriting them with new but equal ones. On the other hand, that's not explicitly specified. It would be nice for it to be well-defined...

(Again, either make your type implement equality based on ID2, or create the hash set using an equality comparer which does so.)

Jon Skeet 2009-11-05 22:07:42

Answer 2

A:

I had a similar issue a couple of months ago.

Try something like this...

public static List<T> RemoveDuplicateSections<T>(List<T> sections) where T:INamedObject
        {
            Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
            List<T> finalList = new List<T>();
            int i = 0;
            foreach (T currValue in sections)
            {
                if (!uniqueStore.ContainsKey(currValue.Name))
                {
                    uniqueStore.Add(currValue.Name, 0);
                    finalList.Add(sections[i]);
                }
                i++;
             }
            return finalList;
        }

xxmrlnxx 2009-11-05 22:27:06

Answer 3

A:

This may smell quite a bit, but I think a LINQ-distinct will still work for you if you ensure the two compared objects come out to be the same. The following comparer would do this:

private class Comp : IEqualityComparer<Item>
    {
      public bool Equals(Item x, Item y)
      {
        var equalityOfB = x.ID2 == y.ID2;
        if (x.ID1 == y.ID1 && equalityOfB)
          return true;
        if (x.ID1 == null && equalityOfB)
        {
          x.ID1 = y.ID1;
          return true;
        }
        if (y.ID1 == null && equalityOfB)
        {
          y.ID1 = x.ID1;
          return true;
        }
        return false;
      }

      public int GetHashCode(Item obj)
      {
        return obj.ID2.GetHashCode();
      }
    }

Then you could use it on a list as such...

var l = new[] { 
  new Item { ID1 = "a", ID2 = "b" }, 
  new Item { ID1 = null, ID2 = "b" } };
var l2 = l.Distinct(new Comp()).ToArray();

flq 2009-11-05 22:28:06

Answer 4

A:

records.GroupBy(r => r, new RecordByIDsEqualityComparer())
       .Select(g => g.OrderByDescending(r => r, new RecordByFullnessComparer()).First())

or if you want to merge the records, then Aggregate instead of OrderByDescending/First.

Andrey Shchekin 2009-11-05 22:46:11

Answer 5

A:

You haven't mentioned whether C# 2.0 or 3.0 ...

I'll give you the C# 2.0 version and you can shorten it with linq if you like. Linq would look much nicer since you could replace all the "delegate"s with lambdas

(I did this to help me learn too - nice data problem)

If your collection was a generic List for example:

class Program {

    public class Record {
        public string ID1;
        public string ID2;
        public string Data1;
        public string Data2;
        public string DataN;
        public override string ToString() {
            return string.Format("ID1 = {0}, ID2 = {1}", ID1, ID2);
        }
    }

    static void Main(string[] args) {

        // Test data
        List<Record> collection = new List<Record>() { 
                new Record() { ID1 = null, ID2 = "x.2" },
                new Record() { ID1 = "x.1", ID2 = "x.2" },
                new Record() { ID1 = "z.1", ID2 = "z.2" },
                new Record() { ID1 = null, ID2 = "s.2" },
                new Record() { ID1 = null, ID2 = "a.2" },
                new Record() { ID1 = "w.1", ID2 = "w.2" },
                new Record() { ID1 = null, ID2 = "w.2" }
        };

        // List of records with no ID1 (low priority data)
        List<Record> emptyID1 = collection.FindAll(delegate(Record r) {
                                   return r.ID1 == null ? true : false;
                               });

        // Temporarily remove all low-priority data
        collection.RemoveAll(delegate(Record r) {
            return r.ID1 == null ? true : false;
        });

        // Check whether we lost any ID2 records when
        // removing low priority data and put ID2's back
        emptyID1.ForEach(delegate(Record emptyIdRecord) {
            if (collection.Find( delegate(Record r) {
                        return r.ID2 == emptyIdRecord.ID2 ? true : false;
            }) == null) {
                collection.Add(emptyIdRecord);
            }
        });

        // Display results
        collection.ForEach( delegate (Record r)
                                {
                                    Console.WriteLine(r.ToString());
                                }
            );

        Console.ReadLine();
    }
}

Neil Fenwick 2009-11-05 23:45:50

ansaurus

tags:

views:

answers:

Removing duplicates from a list with "priority"

related questions