views:

51

answers:

2

I find I'm confused about lazy loading, etc.

First, are these two statements equivalent:

(1) Lazy loading:
_flaggedDates = context.FlaggedDates.Include("scheduledSchools")
.Include  ("interviews").Include("partialDayAvailableBlocks")
.Include("visit").Include("events");

(2) Eager loading:
_flaggedDates = context.FlaggedDates;

In other words, in (1) the "Includes" cause the navigation collections/properties to be loaded along with the specific collection requested, regardless of the fact that you are using lazy loading ... right?

And in (2), the statement will load all the navigation entities even though you do not specifically request them, because you are using eager loading ... right?

Second: even if you are using eager loading, the data will not actually be downloaded from the database until you "enumerate the enumerable", as in the following code:

var dates = from d in _flaggedDates
            where d.dateID = 2
            select d;
foreach (FlaggedDate date in dates)
{
... etc.
}

The data will not actually be downloaded ("enumerated") until the foreach loop ... right? In other words, the "var dates" line defines the query, but the query is not executed until the foreach loop.

Given that (if my assumptions are correct), what's the real difference between eager loading and lazy loading?? It seems that in either case, the data does not appear until the enumeration. Am I missing something?

(My specific experience is with code-first, POCO development, by the way ... though the questions may apply more generally.)

+1  A: 

Your description of (1) is correct, but it is an example of Eager Loading rather than Lazy Loading.

Your description of (2) is incorrect. (2) is technically using no loading at all, but will use Lazy Loading if you try to access any non-scalar values on your FlaggedDates.

In either case, you are correct that no data will be loaded from your data store until you attempt to "do something" with the _flaggedDates. However, what happens is different in each case.

(1): Eager loading: as soon as you begin your for loop, every one of the objects that you have specified will get pulled from the database and built into a gigantic in-memory data structure. This will be a very expensive operation, pulling an enormous amount of data from your database. However, it will all happen in one database round trip, with a single SQL query getting executed.

(2): Lazy loading: When your for loop begins, it will only load the FlaggedDates objects. However, if you access related objects inside your for loop, it will not have those objects loaded into memory yet. The first attempt to retrieve the scheduledSchools for a given FlaggedDate will result in either a new database roundtrip to retrieve the schools, or an Exception being thrown because your context has already been disposed. Since you'd be accessing the scheduledSchools collection inside a for loop, you would have a new database round trip for every FlaggedDate that you initially loaded at the beginning of the for loop.

Reponse to Comments

Disabling Lazy Loading is not the same as Enabling Eager Loading. In this example:

context.ContextOptions.LazyLoadingEnabled = false;
var schools = context.FlaggedDates.First().scheduledSchools;

The schools variable will contain an empty EntityCollection, because I didn't Include them in the original query (FlaggedDates.First()), and I disabled lazy loading so that they couldn't be loaded after the initial query had been executed.

You are correct that the where d.dateID == 2 would mean that only the objects related to that specific FlaggedDate object would be pulled in. However, depending on how many objects are related to that FlaggedDate, you could still end up with a lot of data going over that wire. This is due to the way the EntityFramework builds out its SQL query. SQL Query results are always in a tabular format, meaning you must have the same number of columns for every row. For every scheduledSchool object, there needs to be at least one row in the result set, and since every row has to contain at least some value for every column, you end up with every scalar value on your FlaggedDate object being repeated. So if you have 10 scheduledSchools and 10 interviews associated with your FlaggedDate, you'll end up with 20 rows that each contain every scalar value on FlaggedDate. Half of the rows will have null values for all the ScheduledSchool columns, and the other half will have null values for all of the Interviews columns.

Where this gets really bad, though, is if you go "deep" in the data you're including. For example, if each ScheduledSchool had a students property, which you included as well, then suddenly you would have a row for each Student in each ScheduledSchool, and on each of those rows, every scalar value for the Student's ScheduledSchool would be included (even though only the first row's values end up getting used), along with every scalar value on the original FlaggedDate object. It can add up quickly.

It's difficult to explain in writing, but if you look at the actual data coming back from a query with multiple Includes, you will see that there is a lot of duplicate data. You can use LinqPad to see the SQL Queries generated by your EF code.

StriplingWarrior
For (1) (MY (1), I meant that Lazy Loading was enabled. The statement is an example of eager loading because of the Includes .. right?
Cynthia
For (2) (MY (2)) ... If Lazy Loading is DISABLED (in other words eager loading is in effect), you are saying the navigation properties will not be loaded along with FlaggedDates? Then what does eager loading mean?
Cynthia
Response to your edits: OK, I see what you are saying. But for your point #1: all the objects will be pulled in, but only the ones specified in your query, right? E.g., in my example I said where d.dateID == 2, so only the objects for the FlaggedDate object with dateID = 2 would have the data pulled in. Right? So it would not be so expensive as long as you have limited the scope in your query.
Cynthia
@Cynthia: I responded to your comments in the answer.
StriplingWarrior
OK, I think I'm getting it. Is this true: lazy loading = true: load navigation objects at time of enumeration (by means of generating extra SQL statements); lazy loading = false: you have to use INCLUDES on the original query or LOAD at time of enumeration in order to load navigation objects
Cynthia
So what I'm getting is that "eager loading" is just a word to describe the process of using INCLUDES to load navigation objects at time of enumeration.
Cynthia
Your last comment is correct. So you can use "eager loading" with lazy loading enabled by using "Include." If a property is not included in the original query and you try to use it, lazy loading will kick in to load it, unless you have lazy loading disabled, in which case it will happily tell you that the property is just empty, without any attempt to load values.
StriplingWarrior
A: 

No difference. This was not true in EF 1.0, which didn't support eager loading (at least not automatically). In 1.0, you had to either modify the property to load automatically, or call the Load() method on the property reference.

One thing to keep in mind is that those Includes can go up in smoke if you query across multiple objects like so:

from d in ctx.ObjectDates.Include("MyObjectProperty")
from da in d.Days

ObjectDate.MyObjectProperty will not be automatically loaded.

Chris B. Behrens