views:

119

answers:

4

In our database we have a table with more then 100000 entries but most of the time we only need a part of it. We do this with a very simple query.

items.AddRange(from i in this
    where i.ResultID == resultID && i.AgentID == parentAgentID
    orderby i.ChangeDate descending
    select i);

After this Query we get a List with up to 500 items. But even from this result we only need the newest and following item. My coworker did this very simple with:

items[0];
items[1];

Works fine since the query result is already ordered by date. But the overall performance is very poor. Takes some seconds even.

My idea was to add a .Take(2) at the end of the query but my coworker said this will make no difference.

items.AddRange((from i in resultlist
    where i.ResultID == resultID && i.AgentID == parentAgentID
    orderby i.ChangeDate descending
    select i).Take(2));

We haven't tried this yet and we are still looking for additional ways to speed things up. But database programming is not our strong side and any advice would be great.

Maybe we can even make some adjustments to the database itself? We use a SQL Compact Database.

+2  A: 

Using Take(2) should indeed make a difference, if the optimiser is reasonably smart, and particularly if the ChangeDate column is indexed. (I don't know how much optimization SQL Compact edition does, but I'd still expect limiting the results to be helpful.)

However, you shouldn't trust me or anyone else to say so. See what query is being generated in each case, and run it against the SQL profiler. See what the execution plan is. Measure the performance with various samples. Measure, measure, measure.

Jon Skeet
A: 

Adding .Take(2) will make a big difference. If you only need two items then you should definitely use it and it will most certainly make a performance difference for you.

Add it and look at the SQL that is generated from it. The SQL generated will only get 2 records, which should save you time on the SQL side and also on the object instantiation side.

Joseph
+1  A: 

The problem you might be having is that the data is being pulled down to your computer and then you're doing the Take(2) on it. The part that probably takes the most time is pulling all of that data to your application. If you want SQL server to do it then make sure you don't access any of the result set record's values until you're done with your query statements.

Second, LINQ isn't fast for doing things like sorting and where clauses on large sets of data in application memory. It's much easier to write in LINQ at times but it's always better to do as much sorting and where clauses in the database as opposed to manipulating in memory sets of objects.

If you really care about performance in this scenario, don't use LINQ. Just make a loop.

http://ox.no/posts/linq-vs-loop-a-performance-test

I love using LINQ-To-SQL and LINQ but it's not always the right tool for the job. If you have a lot of data and performance is critical then you don't want to use LINQ for in memory sorting and where statements.

Paul Mendoza
A: 

1 - Add index to cover the fields you use in the query

2 - Make sure that getting just top 2 is not paid by repeating query too frequently

  • try to define query criteria that will let you take a batch of records

3 - Try to compile your LINQ query

ZXX