tags:

views:

8393

answers:

8

The problem is often mentioned in O/R-mapping discussions, and I understand that it has something do to with having to make a lot of database queries for something that seems simple in the object world.

Does anybody have a more detailed (but simple!) explanation of the problem?

+1  A: 

Suppose you have COMPANY and EMPLOYEE. COMPANY has many EMPLOYEES (i.e. EMPLOYEE has a field COMPANY_ID).

In some O/R configurations, when you have a mapped Company object and go to access its Employee objects, the O/R tool will do one select for every employee, wheras if you were just doing things in straight SQL, you could select * from employees where company_id = XX. Thus N (# of employees) plus 1 (company)

This is how the initial versions of EJB Entity Beans worked. I believe things like Hibernate have done away with this, but I'm not too sure. Most tools usually include info as to their strategy for mapping.

davetron5000
+34  A: 

Hi Lars. I'm not an expert, and the best guide is Java Persistence with Hibernate, chapter 13. But I can try to give a short example.

Let's say you have a collection of Car objects (database rows), and each Car has a collection of Wheel objects (database rows). In other words, Car:Wheel is a 1-to-many relationship.

Now, let's say you need to iterate through all the cars, and for each one, print out a list of the wheels. The naive O/R implementation would do the following:

SELECT * FROM Cars;

/* for each car */
SELECT * FROM Wheel WHERE CarId = ?

In other words, you have one select for the Cars, and then N additional selects, where N is the total number of cars.

This is bad :-). Hibernate (I'm not familiar with the other ORM frameworks) gives you several ways to handle it.

Matt Solnit
To clarify on the "This is bad" - you could get all the wheels with 1 select (`SELECT * from Wheel;`), instead of N+1. With a large N, the performance hit can be very significant.
tucuxi
+9  A: 
SELECT 
table1.*
, table2.*
INNER JOIN table2 ON table2.SomeFkId = table1.SomeId

That gets you a result set where child rows in table2 cause duplication by returning the table1 results for each child row in table2. O/R mappers should differentiate table1 instances based on a unique key field, then use all the table2 columns to populate child instances.

SELECT table1.*

SELECT table2.* WHERE SomeFkId = #

The N+1 is where the first query populates the primary object and the second query populates all the child objects for each of the unique primary objects returned.

Consider:

class House
{
    int Id { get; set; }
    string Address { get; set; }
    Person[] Inhabitants { get; set; }
}

class Person
{
    string Name { get; set; }
    int HouseId { get; set; }
}

and tables with a similar structure. A single query for the address "22 Valley St" may return:

Id Address      Name HouseId
1  22 Valley St Dave 1
1  22 Valley St John 1
1  22 Valley St Mike 1

The O/RM should fill an instance of Home with ID=1, Address="22 Valley St" and then populate the Inhabitants array with People instances for Dave, John, and Mike with just one query.

A N+1 query for the same address used above would result in:

Id Address
1  22 Valley St

with a separate query like

SELECT * FROM Person WHERE HouseId = 1

and resulting in a separate data set like

Name    HouseId
Dave    1
John    1
Mike    1

and the final result being the same as above with the single query.

The advantages to single select is that you get all the data up front which may be what you ultimately desire. The advantages to N+1 is query complexity is reduced and you can use lazy loading where the child result sets are only loaded upon first request.

cfeduke
+4  A: 

Here's a good description of the problem - http://www.realsolve.co.uk/site/tech/hib-tip-pitfall.php?name=why-lazy

Now that you understand the problem it can typically be avoided by doing a join fetch in your query. This basically forces the fetch of the lazy loaded object so the data is retrieved in one query instead of n+1 queries. Hope this helps.

Joe Dean
+1  A: 

The supplied link has a very simply example of the n + 1 problem. If you apply it to Hibernate it's basically talking about the same thing. When you query for an object, the entity is loaded but any associations (unless configured otherwise) will be lazy loaded. Hence one query for the root objects and another query to load the associations for each of these. 100 objects returned means one initial query and then 100 additional queries to get the association for each, n + 1.

http://pramatr.com/2009/02/05/sql-n-1-selects-explained/

+1  A: 

Supplier with a one-to-many relationship with Product. One Supplier has (supplies) many Products.

***** Table: Supplier *****
+-----+-------------------+
| ID  |       NAME        |
+-----+-------------------+
|  1  |  Supplier Name 1  |
|  2  |  Supplier Name 2  |
|  3  |  Supplier Name 3  |
|  4  |  Supplier Name 4  |
+-----+-------------------+

***** Table: Product *****
+-----+-----------+--------------------+-------+------------+
| ID  |   NAME    |     DESCRIPTION    | PRICE | SUPPLIERID |
+-----+-----------+--------------------+-------+------------+
|1    | Product 1 | Name for Product 1 |  2.0  |     1      |
|2    | Product 2 | Name for Product 2 | 22.0  |     1      |
|3    | Product 3 | Name for Product 3 | 30.0  |     2      |
|4    | Product 4 | Name for Product 4 |  7.0  |     3      |
+-----+-----------+--------------------+-------+------------+

Factors:

  • Lazy mode for Supplier set to “true” (default)

  • Fetch mode used for querying on Product is Select

  • Fetch mode (default): Supplier information is accessed

  • Caching does not play a role for the first time the

  • Supplier is accessed

Fetch mode is Select Fetch (default)

// It takes Select fetch mode as a default
Query query = session.createQuery( "from Product p");
List list = query.list();
// Supplier is being accessed
displayProductsListWithSupplierName(results);

// It takes Select fetch mode as a default
Query query = session.createQuery( "from Product p");
List list = query.list();
// Supplier is being accessed
displayProductsListWithSupplierName(results);

select ... various field names ... from PRODUCT
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
select ... various field names ... from SUPPLIER where SUPPLIER.id=?

Result:

  • 1 select statement for Product
  • N select statements for Supplier

This is N+1 select problem!

Summy
The code block is repeated, is that intentional?
Abhijeet Kashnia
+1  A: 

In my opinion the article written in Hibernate Pitfall: Why Relationships Should Be Lazy is exactly opposite of real N+1 issue is.

If you need correct explanation please refer Hibernate - Chapter 19: Improving Performance - Fetching Strategies

Select fetching (the default) is extremely vulnerable to N+1 selects problems, so we might want to enable join fetching

Anoop Isaac
i read the hibernate page. It doesn't say what the *N+1 selects problem* actually *is*. But it says you can use joins to fix it.
Ian Boyd
+1  A: 

This problem doesn't just exist in ORMs. It can certainly be found in plenty of applications that don't use them. There is an explanation of one way to get around the issue in Scott Rudy's HP blog.

Scott Rudy