I'm using EF for a set of objects and want to query one type of object with the matching entries from two other tables. This is to dump the query into an XML output. At first, used a join since in the original data each Employee always had 1+ instances of a Computer object (solution #2 below), but it's not necessarily true.
For purpose here, imagine:
- an Employee object,
- each Employee has an EmployeeType (a few fixed entries) and
- each Employee has zero or more Computer objects (typically 0-3).
- each Computer belongs to one Employee, not all Employee has a Computer.
- each Employee has a criteria on which the search is based (such as Division).
So I saw several possible solutions:
Use the Employee.Computer.Load() within a loop, but with 10,000+ rows, that causes an enormous performance penalty.
Use a join in the query, but that leaves out all Employees that do not have a Computer.
Use Linq to Entities,
but that seems to have the overhead of #1: when loading Computers it hits the database for EACH Employee.Use a second query (all Computers with matching Computer.Employee.Division), then in the Employee loop, add any Computer for the given employee. While implementing this, I found that just running the second query (with ToList()) the EF populates the correct Employee.Computer lists with the objects I need.
Here, #4 loads the data with only 2 database hits instead of 10k+, and the EntityFramework actually collates the objects and creates all of the relationships.
My Questions:
- With #4, is the fact that the EF populates the Employee.Computer list something on which I can I rely? If so, can you point me to the documentation?
- Is there a better way than #4?
UPDATE: Well, bugger. Sorry, but I simply blew it. I was focusing on the relationship with the "Computer" table and missed the fact that I had an explicit Employee.EmployeeTypeReference.Load() w/o first testing for null, so the "Computer" list was a complete non-issue.
I only found this when running some performance tests and adding Craig's solution to the mix. In truth, the records aren't "Employees" and "Computers" but abstractions, and I (conditionally) include every field in the XML output, but they're small: a Name, ID(PK) and ID(FK) plus an INT on the "Employee" table. So, my presumption was that the performance would be similar since the EF would create objects not much heavier than the projection.
Anyway, here are the results wher the "Elapsed" time was the difference before this query and after the resultant XML was created.
Case 1: Same as #2, but with Include() statements:
list = ve.Employee.Include("Computer").Include("EmployeeType").Where(e => e.Division.ID == divId).OrderBy(e => e.Name);
Elapsed: 4.96, 5.05
Case 2: Uses in-line Load():
list = ve.Employee.Where(e => e.Division.ID == divId).OrderBy(e => e.Name);
Elapsed: 74.62
Case 3: Same as #4, but with Include() statements:
list = from e in ve.Employee.Include("Computer").Include("EmployeeType") where e.Division.ID == divId orderby e.Name select e;
Elapsed: 4.91, 5.47
Case 4: Uses in-line Load():
list = from e in ve.Employee where e.Division.ID == divId orderby e.Name select e;
Elapsed: 74.20
Case 5: Use *Include("EmployeeType") and separate "Computer" query, let EF associate:
elist = ve.Employee.Include("EmployeeType").Where(te => te.Division.ID == divId).OrderBy(e => e.Name).ToList(); alist = ve.Alias.Where(a => a.Employee.Division.ID == divId).ToList();
Elapsed: 4.50, 4.02
Case 6: Craig's suggestion of projections:
elist = from te in ve.ThesaurusEntry where te.Division.ID==divID orderby te.Name select new { ID = te.ID, Name = te.Name, Type = te.EmployeeType.Name, Freq = te.Frequency, Aliases = from a in te.Alias select new { ID = a.ID, Name = a.Name } };
Elapsed: 0.73, 1.25
Conclusions
Load() is expensive, so use Include() or at least test with IsLoaded
Projection is a bit tedious, but significantly faster than EF fix-up. [with this limited testing on "narrow" tables]