ansaurus

Question

linq - how do you do a query for items in one query source that are not in another one?

Answer 1

+4 A:

NorthwindDataContext dc = new NorthwindDataContext();
dc.Log = Console.Out;
var query =
    from c in dc.Customers
    where !(from o in dc.Orders
            select o.CustomerID)
           .Contains(c.CustomerID)
    select c;
foreach (var c in query) Console.WriteLine( c );

Bramha Ghosh 2008-09-08 21:21:00

Would this have O(N^2) complexity because for each customer it is checking against all orders? Is Darren Kopp's suggestion of Except better?

Graphain 2009-05-05 04:52:50

Yeah when tested it is.

Graphain 2009-05-05 05:06:37

Answer 2

+4 A:

use the Except extension method.

var items1 = new List<string> { "Apple","Orange","Banana" };
var items2 = new List<string> { "Grapes","Apple","Kiwi" };

var excluded = items1.Except(items2);

Darren Kopp 2008-09-08 21:23:03

This is the better method (see my answer below).

Graphain 2009-05-05 05:26:52

Answer 3

A:

Here's a more simple version of the same thing, you don't need to nest the query:

List<string> items1 = new List<string>();
items1.Add("cake");
items1.Add("cookie");
items1.Add("pizza");

List<string> items2 = new List<string>();
items2.Add("pasta");
items2.Add("pizza");

var results = from item in items1
       where items2.Contains(item)
       select item;

foreach (var item in results)
    Console.WriteLine(item); //Prints 'pizza'

Nidonocu 2008-09-08 21:25:21

Answer 4

+1 A:

Another totally different way of looking at it would be to pass a lambda expression (condition for populating the second collection) as a predicate to the first collection.

I know this is not the exact answer to the question. I think other users already gave the correct answer.

Gulzar 2008-09-08 21:34:20

Answer 5

+3 A:

Darren Kopp's answer:

var excluded = items1.Except(items2);

is the best solution from a performance perspective.

(NB: This true for at least regular LINQ, perhaps LINQ to SQL changes things as per Marco Russo's blog post. However, I'd imagine that in the "worst case" Darren Kopp's method will return at least the speed of Russo's method even in a LINQ to SQL environment).

As a quick example try this in LINQPad:

void Main()
{
   Random rand = new Random();
   int n = 100000;
   var randomSeq = Enumerable.Repeat(0, n).Select(i => rand.Next());
   var randomFilter = Enumerable.Repeat(0, n).Select(i => rand.Next());

   /* Method 1: Bramha Ghosh's/Marco Russo's method */
   (from el1 in randomSeq where !(from el2 in randomFilter select el2).Contains(el1) select el1).Dump("Result");

   /* Method 2: Darren Kopp's method */
   randomSeq.Except(randomFilter).Dump("Result");
}

Try commenting one of the two methods out at a time and try out the performance for different values of n.

My experience (on my Core 2 Duo Laptop) seems to suggest:

n = 100. Method 1 takes about 0.05 seconds, Method 2 takes about 0.05 seconds
n = 1,000. Method 1 takes about 0.6 seconds, Method 2 takes about 0.4 seconds
n = 10,000. Method 1 takes about 2.5 seconds, Method 2 takes about 0.425 seconds
n = 100,000. Method 1 takes about 20 seconds, Method 2 takes about 0.45 seconds
n = 1,000,000. Method 1 takes about 3 minutes 25 seconds, Method 2 takes about 1.3 seconds

Method 2 (Darren Kopp's answer) is clearly faster.

The speed decrease for Method 2 for larger n is most likely due to the creation of the random data (feel free to put in a DateTime diff to confirm this) whereas Method 1 clearly has algorithmic complexity issues (and just by looking you can see it is at least O(N^2) as for each number in the first collection it is comparing against the entire second collection).

Conclusion: Use Darren Kopp's answer of LINQ's 'Except' method

Graphain 2009-05-05 05:25:46

nice write up. i also like that Enumerable.Repeat() method. haven't seen that one until today but that is a neat little method.

Darren Kopp 2009-05-05 15:01:49

changed accepted answer due to the greatness of this answer. It just needs to include the answer instead of just referencing it. Please edit this to include this at least (with proper credit given to Darren of course): var excluded = items1.Except(items2);

Carlton Jenke 2009-05-06 13:29:15

ansaurus

tags:

views:

answers:

linq - how do you do a query for items in one query source that are not in another one?

related questions