There are similar questions to this, but I don't think anyone has asked this particular question.
Scenario:
Customer - Order (where Order has a CustomerID) - OrderPart - Part
I want a query that returns a customer with all its orders and each order with its parts.
Now I have two main choices:
- Use a nested loop (which produces separate queries)
- Use data loading options (which produces a single query join)
The question:
Most advice and examples on ORMs suggest using option 2 and I can see why. However, option 2 will potentially be sending back a huge amount of duplicated data, eg:
Option 1 results (3 queries):
ID Name Country 1 Customer1 UK ID Name 1 Order1 2 Order2 ID Name 1 Part1 2 Part2 3 Part3
Option 2 results (1 query):
ID Name Country ID Name ID Name 1 Customer1 UK 1 Order1 1 Part1 1 Customer1 UK 1 Order1 2 Part2 1 Customer1 UK 1 Order1 3 Part3 1 Customer1 UK 2 Order2 1 Part1 1 Customer1 UK 2 Order2 2 Part2
Option 1 sends back 13 fields with 3 queries. Option 2 sends back 42 fields in 1 query. Now imagine Customer table has 30 fields and Orders have more complex sub joins, the data duplication can quickly become huge.
What impact on overall performance do the following things have:
- Overhead of making a database connection
- Time taken to send data (potentially across network if on different server)
- Bandwidth
Is option 2 always the best choice, option 1 the best choice or does it depend on the situation? If it depends, what criteria should you use to determine? Are any ORMs clever enough to work it out for themselves?