views:

148

answers:

6

I have the following linq query:

var allnews = from a in db.News
                              where !(from c in db.NewsViews
                                      where c.UserGuid == thisUser.UserGuid
                                      select c.NewsGuid).Contains(a.NewsGuid)
                              orderby a.Date descending
                              select a;

I'm wondering what is the best way to optimize it? Or would the query profiler do that for me?

Edit: The idea is to get all the news items that the user has not seen yet. So once a user has seen an item, I stored that item in NewsViews. And the news themselves are in News.

+2  A: 

instead of using contains, you could add to the where-statement of your inner query:

... and c.newsguid == a.newsguid

and .Any() of your inner query

var allnews = from a in db.News
                  where !(from c in db.NewsViews
                          where c.UserGuid == thisUser.UserGuid
                            and c.NewsGuid == a.NewsGuid).Any()
                   orderby a.Date descending
                   select a;
Joachim VR
Uhh, .Any() doesn't do what you think...
Eamon Nerbonne
Yes it does, it returns whether there are any items in a collection/query. Including the equality condition in the subquery has the exact same result, only with a smaller set in the subquery, and without the use of the 'in' statement.
Joachim VR
Right, so: `new[]{false,false}.Any()` is true since the collection is not empty - the fact that none of the boolean values holds is irrelevant.
Eamon Nerbonne
Actually, it does, so my point stands.
Joachim VR
I'm not sure exactly which point you're referring to, but to be clear, your current formulation is *not* equivalent to the OP's - it will return no news items at all for users having viewed any news item.
Eamon Nerbonne
I'm sorry, I see your point. My bad, select should have been and, like I said in the explanation. Just a copy error.
Joachim VR
Eamon Nerbonne
Right, c# requires a select statement, as opposed to vb.net. Been a while.
Joachim VR
+4  A: 

The sub-query seems not to use a, so

      //untested
      var allnews = from a in db.News
                    let excluders = from c in db.NewsViews
                                    where c.UserGuid == thisUser.UserGuid
                                    select c.NewsGuid   
                          where !excluders.Contains(a.NewsGuid)
                          orderby a.Date descending
                          select a;

But be advised, you are now doing SQL optimization through LINQ (btw, is this L2S or EF ?).
And normal SQL optimization is difficult enough. You'll have to measure and analyze with realistic data. It is quite possible that @Joachim's approach with multi inner-join subqueries is better.

Henk Holterman
+1 for noting that it's very tricky to estimate how this will perform - test it!
Eamon Nerbonne
By the way, putting the sub-query into a let-clause doesn't change the semantics of the original query, so it'd be surprising if this changes the performance for the better - though you never know...
Eamon Nerbonne
@Earnon: I was assuming some caching but you're right, that would not be obvious with IEnumerable. But this is a IQueryable, I'm counting on the SQL server.
Henk Holterman
Yeah, it's non-obvious what the SQL is that LINQ would produce, nor what internal execution plan SQL would produce - testing is really the best way to tell.
Eamon Nerbonne
+1  A: 

I'm presuming that the goal is to retrieve the NewsViews in descending date order:

db.News.OrderByDescending(a => a.Date).NewsViews;

This, of course, assumes that you have already set up an association in your model between the News and NewsViews entities. By setting up the association ahead of time, the subquery becomes unnecessary.

UPDATE:

I've been using LINQ-to-SQL for about 18 months, and I've been using the same construct as the one you illustrated for my NOT IN queries. As I stated earlier, you may get a little performance bump if you set up your association in the model ahead of time and use indexes in the database itself, but from a LINQ point of view, I believe you're as optimized as you're going to get without resorting to an unnecessarily cryptic query statement.

Neil T.
A: 

Maybe this is my lack to linq knowledge but perhaps a left join where a column in NewsViews is null? That seems like it would be better than making a subquery and comparing the two.

Jonathan Romanowski
+1  A: 

Here's an alternative formulation:

from newsitem in db.News
join viewing in (
       from viewing in db.NewsViews
       where viewing.UserGuid == thisUser.UserGuid
       select viewing
) on newsitem.NewsGuid equals viewing.NewsGuid into usersviewings
where !usersviewings.Any()
orderby newsitem.Date descending
select newsitem;

But as to whether this is any faster - well that's anyone's guess; try it. Fundamentally, you're doing a left join with the left part is filtered and must not return any results - that doesn't index well, AFAIK. The execution engine will need to scan all rows in the news set, and if you're backed by SQL, then table-scans are not your friend. Having said that, unless you actually expect this to be a huge table, it may not matter much, particularly if you only report the top N hits...

Eamon Nerbonne
A: 

The best optimization move you could make here, would be to allow for a navigation from NewsViews to News... Since one doesn't exist, I had to get a little bit hacky with the optimization.

db.News.Join(db.News.Select(n => n.NewsGuid)
    .Except(db.NewsViews
        .Where(c => c.UserGuid == thisUser.UserGuid)
        .Select(c => c.NewsGuid)
    ), n1 => n1.NewsGuid, n2 => n2, (n1, n2) => new { n1 = n1, n2 = n2 })
    .Select(anon => anon.n1);

An Except is going to produce the best performing SQL when you are attempting to do a query where a list does not contain another list. Since no navigation from NewsView to News though, we have to cheat with an Inner Join to return News.

Another way that this could be done would be my pal the GroupJoin.

db.News
    .GroupJoin(db.NewsViews, n => n.NewsGuid, nv => nv.NewsGuid, (n, nv) => new { News = n, NewsViewList = nv })
    .Where(anon => anon.NewsViewList != null) // I don't remember the best test here, either it's not null, or the count > 0 :-)
    .OrderByDescending(anon => anon.News.Date)
    .Select(anon => anon.News);

This is how I would have done it at least.

jwendl