views:

81

answers:

2

I'm using EF for a set of objects and want to query one type of object with the matching entries from two other tables. This is to dump the query into an XML output. At first, used a join since in the original data each Employee always had 1+ instances of a Computer object (solution #2 below), but it's not necessarily true.

For purpose here, imagine:

  • an Employee object,
  • each Employee has an EmployeeType (a few fixed entries) and
  • each Employee has zero or more Computer objects (typically 0-3).
  • each Computer belongs to one Employee, not all Employee has a Computer.
  • each Employee has a criteria on which the search is based (such as Division).

So I saw several possible solutions:

  1. Use the Employee.Computer.Load() within a loop, but with 10,000+ rows, that causes an enormous performance penalty.

  2. Use a join in the query, but that leaves out all Employees that do not have a Computer.

  3. Use Linq to Entities, but that seems to have the overhead of #1: when loading Computers it hits the database for EACH Employee.

  4. Use a second query (all Computers with matching Computer.Employee.Division), then in the Employee loop, add any Computer for the given employee. While implementing this, I found that just running the second query (with ToList()) the EF populates the correct Employee.Computer lists with the objects I need.

Here, #4 loads the data with only 2 database hits instead of 10k+, and the EntityFramework actually collates the objects and creates all of the relationships.

My Questions:

  • With #4, is the fact that the EF populates the Employee.Computer list something on which I can I rely? If so, can you point me to the documentation?
  • Is there a better way than #4?

UPDATE: Well, bugger. Sorry, but I simply blew it. I was focusing on the relationship with the "Computer" table and missed the fact that I had an explicit Employee.EmployeeTypeReference.Load() w/o first testing for null, so the "Computer" list was a complete non-issue.

I only found this when running some performance tests and adding Craig's solution to the mix. In truth, the records aren't "Employees" and "Computers" but abstractions, and I (conditionally) include every field in the XML output, but they're small: a Name, ID(PK) and ID(FK) plus an INT on the "Employee" table. So, my presumption was that the performance would be similar since the EF would create objects not much heavier than the projection.

Anyway, here are the results wher the "Elapsed" time was the difference before this query and after the resultant XML was created.

  • Case 1: Same as #2, but with Include() statements:

    list = ve.Employee.Include("Computer").Include("EmployeeType").Where(e => e.Division.ID == divId).OrderBy(e => e.Name);

    Elapsed: 4.96, 5.05

  • Case 2: Uses in-line Load():

    list = ve.Employee.Where(e => e.Division.ID == divId).OrderBy(e => e.Name);

    Elapsed: 74.62

  • Case 3: Same as #4, but with Include() statements:

    list = from e in ve.Employee.Include("Computer").Include("EmployeeType") where e.Division.ID == divId orderby e.Name select e;

    Elapsed: 4.91, 5.47

  • Case 4: Uses in-line Load():

    list = from e in ve.Employee where e.Division.ID == divId orderby e.Name select e;

    Elapsed: 74.20

  • Case 5: Use *Include("EmployeeType") and separate "Computer" query, let EF associate:

    elist = ve.Employee.Include("EmployeeType").Where(te => te.Division.ID == divId).OrderBy(e => e.Name).ToList(); alist = ve.Alias.Where(a => a.Employee.Division.ID == divId).ToList();

    Elapsed: 4.50, 4.02

  • Case 6: Craig's suggestion of projections:

    elist = from te in ve.ThesaurusEntry where te.Division.ID==divID orderby te.Name select new { ID = te.ID, Name = te.Name, Type = te.EmployeeType.Name, Freq = te.Frequency, Aliases = from a in te.Alias select new { ID = a.ID, Name = a.Name } };

    Elapsed: 0.73, 1.25

Conclusions

Load() is expensive, so use Include() or at least test with IsLoaded

Projection is a bit tedious, but significantly faster than EF fix-up. [with this limited testing on "narrow" tables]

+2  A: 

I believe that you can indicate that the relations can be pre-loaded

Dim employees = From emp in db.Employees.Include("Computer") _
                Select emp
Rob
Indeed, that was #3, the problem was elsewhere -- but I didn't explicitly state that I'd used *Include()*, sorry.
NVRAM
+1  A: 

Rob's solution will work (+1), but if you don't need all of the fields in, say, Employee and Computer, I'd strongly recommend projecting instead:

var q = from e in Context.Employees
        where e.Division.Id = divisionId
        select new
        {
            Name = e.Name,
            EmployeeType = e.EmployeeType.Description,
            ComputerIds = from c in e.Computers
                          select new 
                          {
                              Id = c.Id
                          }
        };

Here you get all you need in one query, but nothing more: All the fields you don't need won't be returned.

You could probably even select into XElements and just save the resulting tree rather than manually converting to XML. I haven't tried this, but it seems like it should work.

Regarding #4, yes, you can rely on this, but it's always good to check IsLoaded before calling Load().

Craig Stuntz
W.r.t. relying on #4 - do you know a reference? I've coded it so that it will *Load()* if not *IsLoaded*, so it won't break -- performance will just tank, but I'm curious.
NVRAM
Google for EF and "fix-up".
Craig Stuntz
Got it, thanks.
NVRAM