I can think of at least three reasons:
- indexes
- caching
- special optimizations (e.g. TOP N SORT)
Indexes
There are many types of queries that will run very fast if run on a database which is correctly indexed but very slow if you iterate through a list in memory. For example a lookup by ID (primary key) is almost instant in a database because the results are stored in a B-tree with very small height. To find the same element in a list in memory would require scanning the entire list.
Caching
Your assumption is that the database always hits the disk. This is not always true. The database will try to hold as much data in memory as it can, so when you ask it for data it already has the answer ready for you. In particular it will hold commonly used indexes in memory and only hit the disk when necessary. The way the data is stored on disk and in memory is also carefully optimized to reduce disk seeks and page misses.
Optimizations
Even without indexes the database still knows many tricks that could speed things up. For example if you do the following in SQL Server:
list.OrderBy(x => x.Value).Take(1)
it will be almost instant if there is an index on list, but even without the index it will use a special optimization called TOP N SORT
that runs in linear time. Check the execution plan for your query to see if this optimization is being used. Note that this optimization is not implemented for LINQ to Objects. We can see this by running this code:
Random random = new Random();
List<Foo> list = new List<Foo>();
for (int i = 0; i < 10000000; ++i)
{
list.Add(new Foo { Id = random.Next() });
}
DateTime now = DateTime.UtcNow;
Foo smallest = list.OrderBy(foo => foo.Id).First();
Console.WriteLine(DateTime.UtcNow - now);
This code takes about 30 seconds to execute, and the execution time grows slower than linearly as more items are added. Replacing the query with this results in it taking less than one second:
int smallestId = list.Min(foo => foo.Id);
This is because in LINQ to objects OrderBy
is implemented using an O(n log(n))
algorithm but Min
uses a O(n)
algorithm. However when executed against SQL Server, both these queries will produce the same SQL and both are linear time - O(n)
.
So running a paging query like OrderBy(x => x.Something).Skip(50).Take(10)
is faster in a database because a lot more effort has gone into making sure that it is faster. After all, the speed of this sort of query is a major selling point for databases.