views:

616

answers:

3

Hi,

I just had the weirdest debug experience in a very long time. It's a bit embarassing to admit, but it lead me to be believe that my Linq query produces MORE results when adding an additional Where clause.

I know it's not possible, so I've refactored my offending function plus the unit test belonging to it into this:

[Test]
public void LoadUserBySearchString()
{
    //Setup
    var AllUsers = new List<User>
                       {
                           new User
                               {
                                   FirstName = "Luke",
                                   LastName = "Skywalker",
                                   Email = "[email protected]"
                               },
                           new User
                               {
                                   FirstName = "Leia",
                                   LastName = "Skywalker",
                                   Email = "[email protected]"
                               }
                       };


    //Execution
    List<User> SearchResults = LoadUserBySearchString("princess", AllUsers.AsQueryable());
    List<User> SearchResults2 = LoadUserBySearchString("princess Skywalker", AllUsers.AsQueryable());

    //Assertion
    Assert.AreEqual(1, SearchResults.Count); //test passed!
    Assert.AreEqual(1, SearchResults2.Count); //test failed! got 2 instead of 1 User???
}


//search CustID, fname, lname, email for substring(s)
public List<User> LoadUserBySearchString(string SearchString, IQueryable<User> AllUsers)
{
    IQueryable<User> Result = AllUsers;
    //split into substrings and apply each substring as additional search criterium
    foreach (string SubString in Regex.Split(SearchString, " "))
    {            
        int SubStringAsInteger = -1;
        if (SubString.IsInteger())
        {
            SubStringAsInteger = Convert.ToInt32(SubString);
        }

        if (SubString != null && SubString.Length > 0)
        {
            Result = Result.Where(c => (c.FirstName.Contains(SubString)
                                        || c.LastName.Contains(SubString)
                                        || c.Email.Contains(SubString)
                                        || (c.ID == SubStringAsInteger)
                                       ));
        }
    }
    return Result.ToList();
}

I have debugged the LoadUserBySearchString function and asserted that the second call to the function actually produces a linq query with two where clauses instead of one. So it seems that the additional where clause is increasing the amount of results.

What's even more weird, the LoadUserBySearchString function works great when I test it by hand (with real users from the database). It only shows this weird behavior when running the unit test.

I guess I just need some sleep (or even an extended vacation). If anyone could please help me shed some light on this, I could go stop questioning my sanity and go back to work.

Thanks,

Adrian

Edit (to clarify on several responses I go so far): I know it looks like it is the or clause, but unfortuantely it is not that simple. LoadUserBySearchString splits the search string into several strings and attaches a Where clause for each of them. "Skywalker" matches both luke and Leia, but "princess" only matches Leia.

This is the Linq query for the search string "princess":

+    Result {System.Collections.Generic.List`1[TestProject.Models.User].Where(c => (((c.FirstName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString) || c.LastName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || c.Email.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || (c.ID = value(TestProject.Controllers.SearchController+<>c__DisplayClass3).SubStringAsInteger)))} System.Linq.IQueryable<TestProject.Models.User> {System.Linq.EnumerableQuery<TestProject.Models.User>}

And this is the Linq clause for the search string "princess Skywalker"

+    Result {System.Collections.Generic.List`1[TestProject.Models.User].Where(c => (((c.FirstName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString) || c.LastName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || c.Email.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || (c.ID = value(TestProject.Controllers.SearchController+<>c__DisplayClass3).SubStringAsInteger))).Where(c => (((c.FirstName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString) || c.LastName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || c.Email.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || (c.ID = value(TestProject.Controllers.SearchController+<>c__DisplayClass3).SubStringAsInteger)))} System.Linq.IQueryable<TestProject.Models.User> {System.Linq.EnumerableQuery<TestProject.Models.User>}

Same as above, just with one additional where clause.

A: 

It's the "or" || in your where.

KP
Unfortuantely it is not. LoadUserBySearchString splits the search string into several strings and attaches a Where clause for each of them. "Skywalker" matches both luke and Leia, but "princess" only matches Leia.
Adrian Grigore
Trying to be too quick. I see now the || is not the problem.
KP
+1  A: 

Your algorithm amounts to "select records which match any of the words in the search string".

This is because of deferred execution. The query is not actually performed until you call the .ToList(). If you move the .ToList() inside the loop, you'll get the behaviour you want.

Winston Smith
You are wrong, see lassevk's answer.
Samuel
Care to explain which part of my answer is wrong? I am correct on both points - it is caused by deferred execution, and doing a .ToList() inside the loop each time would give the correct answer.
Winston Smith
Your very first statement is wrong. Read his code and you will see why. And your suggestion only works if he calls ToList().AsQueryable() inside each loop since he needs IQueryable<User>. And what happens if the IQueryable is slow? You've now made his query a couple powers slower.
Samuel
While "select records which match the last word in the search string" would be more accurate, Joe was the first one to see that this is a problem with deferred execution. So, thanks! :-)
Adrian Grigore
+5  A: 

This is a nice little gotcha.

What is happening is that, because of anonymous methods, and deferred execution, you're actually not filtering on "princess". Instead, you're building a filter that will filter on the contents of the subString variable.

But, you then change this variable, and build another filter, which again uses the same variable.

Basically, this is what you will execute, in short form:

Where(...contains(SubString)).Where(...contains(SubString))

so, you're actually only filtering on the last word, which exists in both, simply because by the time these filters are actually applied, there is only one SubString value left, the last one.

If you change the code so that you capture the SubString variables inside the scope of the loop, it'll work:

if (SubString != null && SubString.Length > 0)
{
    String captured = SubString;
    Int32 capturedId = SubStringAsInteger;
    Result = Result.Where(c => (c.FirstName.Contains(captured)
                                || c.LastName.Contains(captured)
                                || c.Email.Contains(captured)
                                || (c.ID == capturedId)
                               ));
}
Lasse V. Karlsen
Thanks so much! You made my day :-)
Adrian Grigore
+1, using a local variable usually helps solve closure problems. Further reading about the use of closures in LINQ: http://diditwith.net/2007/09/25/LINQClosuresMayBeHazardousToYourHealth.aspx
Lucas