ansaurus

Question

Comparing 2 linq applications: Unexpected result

Answer 1

+4 A:

The primary factor will probably be algorithmic efficiency. LINQ-to-Objects works with IEnumerable<T> inputs and outputs, which are generally processed sequentially, whereas the database may have indexes that induce substantial speed-ups.

Marcelo Cantos 2010-05-16 11:25:38

Answer 2

A:

I can think of at least three reasons:

indexes
caching
special optimizations (e.g. TOP N SORT)

Indexes

There are many types of queries that will run very fast if run on a database which is correctly indexed but very slow if you iterate through a list in memory. For example a lookup by ID (primary key) is almost instant in a database because the results are stored in a B-tree with very small height. To find the same element in a list in memory would require scanning the entire list.

Caching

Your assumption is that the database always hits the disk. This is not always true. The database will try to hold as much data in memory as it can, so when you ask it for data it already has the answer ready for you. In particular it will hold commonly used indexes in memory and only hit the disk when necessary. The way the data is stored on disk and in memory is also carefully optimized to reduce disk seeks and page misses.

Optimizations

Even without indexes the database still knows many tricks that could speed things up. For example if you do the following in SQL Server:

list.OrderBy(x => x.Value).Take(1)

it will be almost instant if there is an index on list, but even without the index it will use a special optimization called TOP N SORT that runs in linear time. Check the execution plan for your query to see if this optimization is being used. Note that this optimization is not implemented for LINQ to Objects. We can see this by running this code:

Random random = new Random();
List<Foo> list = new List<Foo>();
for (int i = 0; i < 10000000; ++i)
{
    list.Add(new Foo { Id = random.Next() });
}

DateTime now = DateTime.UtcNow;
Foo smallest = list.OrderBy(foo => foo.Id).First();
Console.WriteLine(DateTime.UtcNow - now);

This code takes about 30 seconds to execute, and the execution time grows slower than linearly as more items are added. Replacing the query with this results in it taking less than one second:

int smallestId = list.Min(foo => foo.Id);

This is because in LINQ to objects OrderBy is implemented using an O(n log(n)) algorithm but Min uses a O(n) algorithm. However when executed against SQL Server, both these queries will produce the same SQL and both are linear time - O(n).

So running a paging query like OrderBy(x => x.Something).Skip(50).Take(10) is faster in a database because a lot more effort has gone into making sure that it is faster. After all, the speed of this sort of query is a major selling point for databases.

Mark Byers 2010-05-16 11:27:27

ansaurus

tags:

views:

answers:

Comparing 2 linq applications: Unexpected result

related questions