views:

25

answers:

1

Assuming a typical domain entity approach with SQL Server and a dbml/L2S DAL with a logic layer on top of that:

In situations where lazy loading is not an option, I have settled on a convention where getting a list of entities does not also get each item's child entities (no loading), but getting a single entity does (eager loading).

Since getting a single entity also gets children, it causes a cascading effect in which each child then gets its children too. This sounds bad, but as long as the model is not too deep, I usually don't see performance problems that outweigh the benefits of the ease of use.

So if I want to get a list in which each of the items is fully hydrated with children, I combine the GetList and GetItem methods. So I'll get a list and then loop through it getting each item with the full cascade. Even this is generally acceptable in many of the projects I've worked on - but I have recently encountered situations with larger models and/or more data in which it needs to be more efficient.

I've found that partitioning the loop and executing it on multiple threads yields excellent results. In my first experiment with a list of 50 items from one particular project, I did 5 threads of 10 items each and got a 3X improvement in time.

Of course, the mileage will vary depending on the project but all else being equal this is clearly a big opportunity. However, before I go further, I was wondering what others have done that have already been through this. What are some good approaches to parallelizing this type of thing?

A: 

Usually it is faster to make a single database call that returns a set of records.

This recordset can "hydrate" the top-level objects, then another recordset can load child objects. I'm not sure how your situation does not allow lazy-loading, but this method is essentially lazy-loading, and will surely be faster than making multiple calls to the database that returns a single record each time.

You could make asynchronous calls to the database so that multiple queries are running in parallel. If you combine this with the first strategy for each "layer" of the model, and write a somewhat more complex hydration function based on multiple-record return sets, you should see that the database handles concurrent connections very well (which is why you see a performance gain from using multiple threads).

But you don't need to explicitly create threads - check out the asynchronous methods of a SqlCommand.

Jeff Meatball Yang
Jeff, thanks for the reply. Not sure we're on the same page though. I AM doing a single database call that returns a set of records. The issue has to do with the details of each record and the details of each of those records, etc. It is like a pyramid.
MarkB
In practical terms, the only list I can get is the top one - the rest have to be one offs unless I drop back to doing joins in the database where many tables (aka entities) are merged into one de-normalized result. But if I did this in the DAL it would break the isolated mapping of each entity and bypass their logic layers which opens a whole different can of worms. This issue seems to be the basic tension\tradeoff of data driven vs. domain driven approaches.
MarkB