views:

503

answers:

5

If I have 2 query sources how do I find ones that are in one that are not in the other?

example of join to find items in both:

var results = from item1 in qs1.Items
   join item2 in qs2 on item1.field1 equals item2.field2
   select item1;

So what would the linq code be to return the items in qs1 that are not in qs2?

+4  A: 

From Marco Russo

NorthwindDataContext dc = new NorthwindDataContext();
dc.Log = Console.Out;
var query =
    from c in dc.Customers
    where !(from o in dc.Orders
            select o.CustomerID)
           .Contains(c.CustomerID)
    select c;
foreach (var c in query) Console.WriteLine( c );
Bramha Ghosh
Would this have O(N^2) complexity because for each customer it is checking against all orders? Is Darren Kopp's suggestion of Except better?
Graphain
Yeah when tested it is.
Graphain
+4  A: 

use the Except extension method.

var items1 = new List<string> { "Apple","Orange","Banana" };
var items2 = new List<string> { "Grapes","Apple","Kiwi" };

var excluded = items1.Except(items2);
Darren Kopp
This is the better method (see my answer below).
Graphain
A: 

Here's a more simple version of the same thing, you don't need to nest the query:

List<string> items1 = new List<string>();
items1.Add("cake");
items1.Add("cookie");
items1.Add("pizza");

List<string> items2 = new List<string>();
items2.Add("pasta");
items2.Add("pizza");

var results = from item in items1
       where items2.Contains(item)
       select item;

foreach (var item in results)
    Console.WriteLine(item); //Prints 'pizza'
Nidonocu
+1  A: 

Another totally different way of looking at it would be to pass a lambda expression (condition for populating the second collection) as a predicate to the first collection.

I know this is not the exact answer to the question. I think other users already gave the correct answer.

Gulzar
+3  A: 

Darren Kopp's answer:

var excluded = items1.Except(items2);

is the best solution from a performance perspective.

(NB: This true for at least regular LINQ, perhaps LINQ to SQL changes things as per Marco Russo's blog post. However, I'd imagine that in the "worst case" Darren Kopp's method will return at least the speed of Russo's method even in a LINQ to SQL environment).

As a quick example try this in LINQPad:

void Main()
{
   Random rand = new Random();
   int n = 100000;
   var randomSeq = Enumerable.Repeat(0, n).Select(i => rand.Next());
   var randomFilter = Enumerable.Repeat(0, n).Select(i => rand.Next());

   /* Method 1: Bramha Ghosh's/Marco Russo's method */
   (from el1 in randomSeq where !(from el2 in randomFilter select el2).Contains(el1) select el1).Dump("Result");

   /* Method 2: Darren Kopp's method */
   randomSeq.Except(randomFilter).Dump("Result");
}

Try commenting one of the two methods out at a time and try out the performance for different values of n.

My experience (on my Core 2 Duo Laptop) seems to suggest:

n = 100. Method 1 takes about 0.05 seconds, Method 2 takes about 0.05 seconds
n = 1,000. Method 1 takes about 0.6 seconds, Method 2 takes about 0.4 seconds
n = 10,000. Method 1 takes about 2.5 seconds, Method 2 takes about 0.425 seconds
n = 100,000. Method 1 takes about 20 seconds, Method 2 takes about 0.45 seconds
n = 1,000,000. Method 1 takes about 3 minutes 25 seconds, Method 2 takes about 1.3 seconds

Method 2 (Darren Kopp's answer) is clearly faster.

The speed decrease for Method 2 for larger n is most likely due to the creation of the random data (feel free to put in a DateTime diff to confirm this) whereas Method 1 clearly has algorithmic complexity issues (and just by looking you can see it is at least O(N^2) as for each number in the first collection it is comparing against the entire second collection).

Conclusion: Use Darren Kopp's answer of LINQ's 'Except' method

Graphain
nice write up. i also like that Enumerable.Repeat() method. haven't seen that one until today but that is a neat little method.
Darren Kopp
changed accepted answer due to the greatness of this answer. It just needs to include the answer instead of just referencing it. Please edit this to include this at least (with proper credit given to Darren of course): var excluded = items1.Except(items2);
Carlton Jenke