views:

192

answers:

4

Lets say we need to select two sets from a table: "Things"

var GradeA = db.Things.Where(t=>  condition1);
var GradeB = db.Things.Where(t=> !condition1 && condition2);
var DesiredList = GradeA.union(GradeB);

alternatively, we need to write a single statement to avoid union cost:

var DesiredList = db.Things.Where(t=> condtion1 || (!condition1 && condition2));

the problem is query optimizer seems to trim the expression to condition2 only.

How to preserve the priority between condition1 and condition2

a real life example workaround is :

/// <summary>
/// Gets only first BookTag for each tag word, chooses the one of this user (if exists).
/// </summary>
/// <param name="book"></param>
/// <returns></returns>
public static IQueryable<BookTag> UniqueByTags(this IQueryable<BookTag> bookTags, User user)
{
    return bookTags.GroupBy(BT => BT.TagId)
        .Select(g => g.Any(bt => bt.UserId == user.Id) ? 
            new BookTag() { User = user, Tag = g.First().Tag, Book = bookTags.First().Book } : 
            new BookTag() {User = g.First().User, Tag = g.First().Tag, Book = bookTags.First().Book}
            );
}

Edit:

Example is getting Auto-Complete list:

  • input: str
  • output: things that start with str and things that contain str (no duplicates)


Another Example: Selecting ThingTags which have 3 properties:

  • ThingID
  • UserID
  • TagID

we want to select only one ThingTag for each TagID under condition that we select the one with UserID equals parameter if exists, otherwise select first ThingTag for that TagID.

Still with me ? hope so :)


+2  A: 

Is there any reason for not writing this:

var DesiredList = db.Things.Where(t=> condition1 || condition2);

That's logically the same set of elements, after all. As it's a simpler expression, it's more likely that the query generator will get it right. Having said that, I'm surprised it's getting it wrong. Do you have a complete example you could provide?

Jon Skeet
@Jon - you might want to double check my thinking, but it looks like he's talking about doing an XOR rather than a simple OR. He doesn't want results when BOTH conditions are true.
Justin Niessner
@Justin - That may be the intent, but based on the question, if condition 1 and condition 2 are true, the first part of the OR evaluates true and gets an overall true, so it's logically equivalent.
Nick Craver
@Justin: Nick's logic was exactly what I had in mind. Note that due to shortcircuiting, the *C#* code would only evaluate `condition2` if `condition1` was false anyway. Quite what that means for the generated SQL, I'm not sure.
Jon Skeet
RBarryYoung
Good point about the three-valued logic. Hopefully Ahmed will post what happened when he tried this idea at some point...
Jon Skeet
Downvoters: care to give reasons?
Jon Skeet
A: 

It looks to me like you want to do an XOR (Exclusive or) rather than a regular OR between your two conditions (in other words, you want the items that meet the requirements of only one OR the other...not both).

I'm not positive about LINQ to SQL, but I know LINQ to Objects supports the XOR...so you can give it a shot. Here's the syntax:

var DesiredList = db.Things.Where(t => condition1 ^ condition2);
Justin Niessner
this would select a Thing given condition1 true while condition2 false, I want to disregard condition2 when condition1 is satisfied
Ahmed Khalaf
A: 

To take the example literally, you could perform the combination in the question by building an Expression on the fly:

    static IQueryable<T> WhereXElseY<T>(
        this IQueryable<T> query,
        Expression<Func<T, bool>> predicateA,
        Expression<Func<T, bool>> predicateB)
    {
        var condA = predicateA.Body;
        var param = predicateA.Parameters[0];

        var body = Expression.OrElse(
            condA,
            Expression.AndAlso(
                Expression.Not(condA),
                Expression.Invoke(
                    predicateB, param)));
        var lambda = Expression.Lambda<Func<T, bool>>(
            body, param);
        return query.Where(lambda);
    }

However, while this may work with LINQ-to-SQL, it won't work with EF since EF sadly hates Expression.Invoke. But as Jon notes; if you are sending this down to a database backend, the priority is irrelevant and you might as well go with the logically equivalent condition1 || condition2. You can combine expressions something like:

    static IQueryable<T> WhereAny<T>(
        this IQueryable<T> query,
        params Expression<Func<T, bool>>[] predicates)
    {
        if (predicates == null) throw new ArgumentNullException("predicates");
        if (predicates.Length == 0) return query.Where(x => false);
        if (predicates.Length == 1) return query.Where(predicates[0]);

        var param = predicates[0].Parameters[0];
        var body = predicates[0].Body;
        for (int i = 1; i < predicates.Length; i++)
        {
            body = Expression.OrElse(
                body, Expression.Invoke(predicates[i], param));
        }
        var lambda = Expression.Lambda<Func<T, bool>>(body, param);
        return query.Where(lambda);
    }

If I've missed the point, please clarify...

Marc Gravell
Thanks Marc ! but I'd like to ask, does it make any difference if I build the expression in code or as lambda ? I think query optimizer will work on it and we get the same results ?
Ahmed Khalaf
In general (and in theory) the result would be the same; this answer relates to a given scenario of usage - but if you *can* write it all as a simple one-liner, then do that.
Marc Gravell
this means we wont get a different result, hence no progress :(
Ahmed Khalaf
It shouldn't do; care to elaborate?
Marc Gravell
Sure, My mind was to find a solution
Ahmed Khalaf
A: 

First: I would really re-check your original code for the conditions, while its possible there is a bug in the query optimizer its more likely there was a bug on the expression used and it really didn't represent the below:

var DesiredList = db.Things.Where(t=> condition1 || (!condition1 && condition2));

the problem is query optimizer seems to trim the expression to condition2 only.

That should really give you the ones that match condition1 regardless of condition2 ... and ones that don't match condition1 and match condition2. Instead condition2 alone is not equivalent because that leaves out records that only match condition1.

JS version of just (condition1 || condition2) is equivalent to the quoted expression above, as when u are matching condition1 you are already matching both condition2 and !condition2 so you are already including condition2 for both condition1 and !condition1 cases. If this doesn't match what you intended with the queries, is then clearer that's not an issue with the optimizer but with the original expressions.

You would only need the full expressions if you were joining 2 results with a Concat instead of Union, as that would mean you would end up with results matching in both expressions ... and then you would have repeated results. But in contrast, the Where is evaluated per row, so you don't have those concerns in there.


Second: From the code sample, I think what you are facing is less direct of what you are going after in your question. You mentioned you are getting the first tag, but what you are really doing can be seen in this re-written version:

public static IQueryable<BookTag> UniqueByTags(this IQueryable<BookTag> bookTags, User user)
{
    return bookTags.GroupBy(BT => BT.TagId)
        .Select(g => new BookTag() { 
             User = g.Any(bt => bt.UserId == user.Id) ? user : g.First().User,
             Tag = g.First().Tag, Book = bookTags.First().Book 
        });
}

What's mentioned in the comment seems more like:

public static IQueryable<BookTag> UniqueByTags(this IQueryable<BookTag> bookTags, User user)
{
    return bookTags.GroupBy(BT => BT.TagId)
        .Select(g => g.Any(bt => bt.UserId == user.Id) ? 
            g.First(bt=>bt.UserId == user.Id) : 
            g.First()
        );
}
eglasius
Thanks Freddy ! For the second part of your answer, I don't really think these re-writes are a good idea, because I want ONLY ONE query to be executed but you use First(bt=>bt.UserId == user.Id) again after you already executed the ANY()... which is not the same performance as the originally intended query
Ahmed Khalaf