views:

1953

answers:

5

I'm seeing performance problems with retrieving multiple instances of objects that have many relationships with other objects. I'm using Spring and Hibernate's JPA implementation with MySQL. The issue is that when executing a JPA query, Hibernate does not automatically join to other tables. This results in n*r + 1 SQL queries, where n is the number of objects being retrieved and r is the number of relationships.

Example, a Person lives at an Address, has many Hobbies, and has visited many Countries:

@Entity
public class Person {
    @Id public Integer personId;    
    public String name;    
    @ManyToOne public Address address;    
    @ManyToMany public Set<Hobby> hobbies;    
    @ManyToMany public Set<Country> countriesVisited;
}

When I perform a JPA query to get all Persons named Bob, and there are 100 Bobs in the database:

SELECT p FROM Person p WHERE p.name='Bob'

Hibernate translates this to 301 SQL queries:

SELECT ... FROM Person WHERE name='Bob'
SELECT ... FROM Address WHERE personId=1
SELECT ... FROM Address WHERE personId=2
...
SELECT ... FROM Hobby WHERE personId=1
SELECT ... FROM Hobby WHERE personId=2
...
SELECT ... FROM Country WHERE personId=1
SELECT ... FROM Country WHERE personId=2
...

According to the Hibernate FAQ (here and here), the solution is to specify LEFT JOIN or LEFT OUTER JOIN (for many-to-many) in the query. So now my query looks like:

SELECT p, a, h, c FROM Person p
LEFT JOIN p.address a LEFT OUTER JOIN p.hobbies h LEFT OUTER JOIN p.countriesVisited c
WHERE p.name = 'Bob'

This works, but there appears to be a bug if there's more than one LEFT OUTER JOIN in which case Hibernate is incorrectly looking for a non-existent column:

could not read column value from result set: personId69_2_; Column 'personId69_2_' not found.

The bug behavior appears to be possibly addressed by Hibernate Core bug HHH-3636. Unfortunately the fix is not part of any released Hibernate JAR. I've ran my application against the snapshot build but the bug behavior is still present. I've also built my own Hibernate Core JAR from the latest code in the repository and the bug behavior is still present. So maybe HHH-3636 doesn't address this.

This Hibernate performance limitation is very frustrating. If I query for 1000 objects then 1000*r + 1 SQL queries are made to the database. In my case I have 8 relationships so I get 8001 SQL queries, which results in horrible performance. The official Hibernate solution to this is to left join all relationships. But this isn't possible with more than one many-to-many relationships due to the bug behavior. So I'm stuck with left joins for many-to-one relationships and n*r+1 queries due to the many-to-many relationships. I plan to submit the LEFT OUTER JOIN problem as a Hibernate bug, but in the meantime my customer needs an app that has reasonable performance. I currently use a combination of batch fetch (BatchSize), ehcache and custom in-memory caching but the performance is still pretty poor (it improved retrieving 5000 objects from 30 to 8 seconds). The bottom line is that too many SQL queries are hitting the database.

So, my questions, is it possible to use Hibernate in performance sensitive applications where tables have multiple relationships with each other? I would love to hear how successful Hibernate uses address performance. Should I be hand writing SQL (which somewhat defeats the purpose of using Hibernate)? Should I de-normalize my database schema to reduce the number of joined tables? Should I not be using Hibernate if I need fast query performance? Is there something faster?

+3  A: 

Have You tried the "join" fetch strategy for the collections?

Maurice Perry
A: 

If you need a feature of Hibernate and this feature is buggy you have two options: a) Submit a bugrequest and use a workaround (slow performance or handwritten sql) until the bug is fixed which will take a while b) Submit a bugrequest along with a bugfix and tests. (of course you could just use the bugfix and skip the bugrequest and test part).

Jens Schauder
+6  A: 

See my answer to your other question, if you read the whole of the FAQ you linked to:

Follow the best practices guide! Ensure that all and mappings specify lazy="true" in Hibernate2 (this is the new default in Hibernate3). Use HQL LEFT JOIN FETCH to specify which associations you need to be retrieved in the initial SQL SELECT.

A second way to avoid the n+1 selects problem is to use fetch="subselect" in Hibernate3.

If you are still unsure, refer to the Hibernate documentation and Hibernate in Action.

See the tips on improving performance. If you are not careful with joins, you will end up with Cartesian Product problems.

toolkit
Thanks, I'm investigating this right now
Steve Kuo
Subselect works great, thanks.
Steve Kuo
+1  A: 

Besides "fetch" strategy you might also try setting batch fetch size in hibernate properties, so it will run joining queries not one by one but in batches.

In your appContext.xml:

<bean id="sessionFactory" class="org.springframework.orm.hibernate3.LocalSessionFactoryBean">
    ...    
    <property name="hibernateProperties">
        <props>        
            ...
            <prop key="hibernate.default_batch_fetch_size">32</prop>
        </props>
    </property>
</bean>

So instead of:

SELECT ... FROM Hobby WHERE personId=1
SELECT ... FROM Hobby WHERE personId=2

You will get:

SELECT ... FROM Hobby WHERE personId in (1,2,...,32);
SELECT ... FROM Hobby WHERE personId in (33,34,...,64);
serg
A: 

nobody ever knows the answer for this question: is the "additional query being executed(instead of a single query) in a one-to-many scenario" a known bug?. The usual answer is to use fetch="join". But it still executes the additional query in one-to-many scenario. Is this a known bug? If so, why is this not clearly specified in the documentation itself? It seems hundreds of hibernate users are asking this question. Why is this not being put in the faq (say clearly it is a known bug or it is not)?