views:

61

answers:

2

I have a LINQ query that searches for multiple keywords on multiple columns. The intention is that the user can search for multiple keywords and it will search for the keywords on every property in my Media entity. Here is a simplified example:

var result = repository.GetAll<Media>().Where(x =>
    x.Title.Contains("Apples") || x.Description.Contains("Apples") || x.Tags.Contains("Apples") ||
    x.Title.Contains("Oranges") || x.Description.Contains("Oranges") || x.Tags.Contains("Oranges") ||
    x.Title.Contains("Pears") || x.Description.Contains("Pears") || x.Tags.Contains("Pears")
);

In other words, I want to search for the keywords Apples, Oranges, and Pears on the columns Title, Description, and Tags.

The outputted SQL looks like this:

SELECT *
FROM Media this_
WHERE  ((((((((
       this_.Title like '%Apples%'
    or this_.Description like '%Apples%')
    or this_.Tags like '%Apples%')

    or this_.Title like '%Oranges%')
    or this_.Description like '%Oranges%')
    or this_.Tags like '%Oranges%')

    or this_.Title like '%Pears%')
    or this_.Description like '%Pears%')
    or this_.Tags like '%Pears%')

Is this the most optimal SQL in this case? If not, how do I rewrite the LINQ query to create the most optimal SQL statement? I'm using SQLite for testing and SQL Server for actual deployment.

A: 

I can't see how it could be significantly faster, to be honest. Admittedly a "contains" style of wildcarding is likely to be fairly slow to start with (compared with "starts with" for example).

Have you looked at what the SQL Server execution plan is like? What's the actual performance like, with a realistic data set?

Jon Skeet
I just created 2000 records in a SQLite database and ran a query much like the one I posted. According to NHProf, it only took 43 ms for the entire query. Realistically, the client will only have several thousand records (not millions), so the performance is perfectly fine.
Daniel T.
@Daniel T: You ought to try on SQL Server as well - although with only thousands of records I wouldn't expect it to be a problem.
Jon Skeet
+2  A: 

The real performance hit is that this kind of query is tough to optimize. You want to find substrings, which by default are not indexable.

From a purely L2S perspective there isn't much you can do. But if you can enable Full-text search, you'll have much better tools at your disposal to speed up your query.

See this Stack Overflow post for more info.

roufamatic
Thanks for your answer. I ran a test like Jon Skeet mentioned and it turns out the execution time for my expected record count is negligible.
Daniel T.