views:

142

answers:

2

I am trying to get the subset of items in dataA that are in dataB, and have different values of property c. The properties a and b can be used as an index, so I have tried to filter out only the useful pairs then check to see if they have a different c value.

This is the linq expression I came up with, and it does work, but It seems like there has to be a better/faster way of finding this subset.

var itemsInBoth = from item in dataA
                  from item2 in dataB
                  where item.a == item2.a && item.b == item2.b
                      select new
                      {
                          first= item,
                          second = item2
                      };
var haveDifferentC = from item in itemsInBoth 
                     where item.first.c != item.second.c
                     select item.first;
+2  A: 

Faster? What you have there is O(n^2). Each item in the first list will fully iterate the items in the second list. You need to remove the redundant iteration in that join. One way to do that is to use another structure to do O(1) lookups for matchs.

Here's some untested (unchecked) code:

var dictionaryA = dataA
  .GroupBy(item => new {a = item.a, b = item.b})
  .ToDictionary(g => g.Key, g => g.ToList());

var dictionaryB = dataB
  .GroupBy(item => new {a = item.a, b = item.b})
  .ToDictionary(g => g.Key, g => g.ToList());

var results = dictionaryA
  .Where(g1 => dictionaryB.ContainsKey(g1.Key))
  .Select(g1 => new {g1 = g1, g2 = dictionaryB[g1.Key]})
  .SelectMany(pair =>
    pair.g1.SelectMany(item1 =>
      pair.g2
      .Where(item2 => item2.c != item1.c)
      .Select(item2 => new {item1, item2})
    )
  );

Here's a simplified version if a,b pairs are unique in each list.

var dictionaryA = dataA
  .ToDictionary(item => new {a = item.a, b = item.b}, item => item);

var dictionaryB = dataB
  .ToDictionary(item => new {a = item.a, b = item.b}, item => item);

var results = dictionaryA
  .Where(e1 => dictionaryB.ContainsKey(e1.Key))
  .Select(e1 => new {i1 = e1.Value, i2 = dictionaryB[e1.Key]})
  .Where(pair => pair.i1.c != pair.i2.c);
David B
+3  A: 

Based on the answer provided by David B, I eventually settled on a slightly modified version of his method. Although the differences are minor, I thought I would share this, primarily to show a version that for those (like me) that prefer the expressive syntax.

Also, instead of grouping, I decided to use an anonymous key/value pair to simplify the structure.

var dictA = (from item in dataA
             select new
             {
                 key = CreateIndexValue(item.a, item.b),
                 value = item
             }).ToDictionary(kv => kv.key, kv => kv.value);
var dictB = (from item in dataB
             select new
             {
                 key = CreateIndexValue(item.a, item.b),
                 value = item
             }).ToDictionary(kv => kv.key, kv => kv.value);
var filesInBoth = from item in dictA
                  where dictB.ContainsKey(item.Key)
                  select new
                  {
                      itemA = dictA[item.Key],
                      itemB = dictB[item.Key]
                  };
var differentSize = from item in filesInBoth
                    where item.itemA.c!= item.itemB.c
                    select item.itemA;
chills42
Holy Moses! :) +1 for effort
leppie
yeah... little ugly, but it's a heck of a lot faster...
chills42
Gratz, glad it all worked out. I see that a,b pairs within each list are unique? I could have written a simpler answer if I'd known that.
David B