ansaurus

Question

Efficiency of Hibernate's table-per-subclass inheritance strategy

Answer 1

+3 A:

You'll find that Hibernate writes the query for an unknown animal type with a series of LEFT JOIN statements, one per subclass. So the query will slow as the number of subclasses increases, and will attempt to return an ever wider result set. So you are correct, it doesn't scale well with large class hierarchies.

With HQL, yes you can query the subclass directly, and access its properties. That will then be rendered with a single INNER JOIN.

I haven't tried this with multiple levels of inheritance. If the above hasn't put you off yet, suggest you try it and see - you can turn on SQL debug output to see what is being sent to the database, or simply profile your database.

David M 2010-01-07 12:43:47

Thanks - I considered testing myself but for some reason didn't think it would be worthwhile without a realistic body of data for performance testing, and I didn't want to *start* developing this way with my concerns. However you're right that a dummy test case will at least let me see the techniques used, and reduce this from a Hibernate performance question to a SQL performance question, which should be much easier to reason about.

Andrzej Doyle 2010-01-07 14:18:23

Referring to your third paragraph: as the SQL I posted shows, the multiple levels of inheritance don't seem to change much. So far as I can tell, the only difference is the order of the clauses in the `case` statement used to generate a synthetic discriminator. So certainly from a performance perspective this should be identical. (Of course there will be more tables to join when concretely instantiating a subclass but that doesn't concern me too much).

Andrzej Doyle 2010-01-07 18:16:51

Marking this as accepted since on reflection it clearly answers the question I asked, even though it doesn't provide the magic bullet I was hoping for.

Andrzej Doyle 2010-01-08 15:42:21

Thanks. Good question, and great to get some post question feedback from real research!

David M 2010-01-08 15:57:55

Answer 2

+1 A:

After David M's helpful answer I decided to throw together a skeleton test.

I created an abstract superclass, ADTestA, and 25 concrete subclasses in a three-level hierarchy (I expect you can guess their names). Each class had a single integer field with a name corresponding to its letter - so for example, class ADTestG has a single int field g in addition to the b field it inherits from its immediate parent ADTestB, and the pkey and a fields from the top-level abstract superclass.

Issuing the HQL query from ADTestA where pkey=1 resulted in the following SQL:

select adtesta0_.pkey as pkey0_, adtesta0_.a as a0_, adtesta0_1_.b as b1_,
       adtesta0_2_.c as c2_, adtesta0_3_.d as d3_, adtesta0_4_.e as e4_,
       adtesta0_5_.f as f5_, adtesta0_6_.g as g6_, adtesta0_7_.h as h7_,
       adtesta0_8_.i as i8_, adtesta0_9_.j as j9_, adtesta0_10_.k as k10_,
       adtesta0_11_.l as l11_, adtesta0_12_.m as m12_, adtesta0_13_.n as n13_,
       adtesta0_14_.o as o14_, adtesta0_15_.p as p15_, adtesta0_16_.q as q16_,
       adtesta0_17_.r as r17_, adtesta0_18_.s as s18_, adtesta0_19_.t as t19_,
       adtesta0_20_.u as u20_, adtesta0_21_.v as v21_, adtesta0_22_.w as w22_,
       adtesta0_23_.x as x23_, adtesta0_24_.y as y24_, adtesta0_25_.z as z25_,
       case
           when adtesta0_6_.pkey is not null then 6
           when adtesta0_7_.pkey is not null then 7
           when adtesta0_8_.pkey is not null then 8
           when adtesta0_9_.pkey is not null then 9
           when adtesta0_10_.pkey is not null then 10
           when adtesta0_11_.pkey is not null then 11
           when adtesta0_12_.pkey is not null then 12
           when adtesta0_13_.pkey is not null then 13
           when adtesta0_14_.pkey is not null then 14
           when adtesta0_15_.pkey is not null then 15
           when adtesta0_16_.pkey is not null then 16
           when adtesta0_17_.pkey is not null then 17
           when adtesta0_18_.pkey is not null then 18
           when adtesta0_19_.pkey is not null then 19
           when adtesta0_20_.pkey is not null then 20
           when adtesta0_21_.pkey is not null then 21
           when adtesta0_22_.pkey is not null then 22
           when adtesta0_23_.pkey is not null then 23
           when adtesta0_24_.pkey is not null then 24
           when adtesta0_25_.pkey is not null then 25
           when adtesta0_1_.pkey is not null then 1
           when adtesta0_2_.pkey is not null then 2
           when adtesta0_3_.pkey is not null then 3
           when adtesta0_4_.pkey is not null then 4
           when adtesta0_5_.pkey is not null then 5
           when adtesta0_.pkey is not null then 0
       end as clazz_
from ADTestA adtesta0_
           left outer join ADTestB adtesta0_1_ on adtesta0_.pkey=adtesta0_1_.pkey
           left outer join ADTestC adtesta0_2_ on adtesta0_.pkey=adtesta0_2_.pkey
           left outer join ADTestD adtesta0_3_ on adtesta0_.pkey=adtesta0_3_.pkey
           left outer join ADTestE adtesta0_4_ on adtesta0_.pkey=adtesta0_4_.pkey
           left outer join ADTestF adtesta0_5_ on adtesta0_.pkey=adtesta0_5_.pkey
           left outer join ADTestG adtesta0_6_ on adtesta0_.pkey=adtesta0_6_.pkey
           left outer join ADTestH adtesta0_7_ on adtesta0_.pkey=adtesta0_7_.pkey
           left outer join ADTestI adtesta0_8_ on adtesta0_.pkey=adtesta0_8_.pkey
           left outer join ADTestJ adtesta0_9_ on adtesta0_.pkey=adtesta0_9_.pkey
           left outer join ADTestK adtesta0_10_ on adtesta0_.pkey=adtesta0_10_.pkey
           left outer join ADTestL adtesta0_11_ on adtesta0_.pkey=adtesta0_11_.pkey
           left outer join ADTestM adtesta0_12_ on adtesta0_.pkey=adtesta0_12_.pkey
           left outer join ADTestN adtesta0_13_ on adtesta0_.pkey=adtesta0_13_.pkey
           left outer join ADTestO adtesta0_14_ on adtesta0_.pkey=adtesta0_14_.pkey
           left outer join ADTestP adtesta0_15_ on adtesta0_.pkey=adtesta0_15_.pkey
           left outer join ADTestQ adtesta0_16_ on adtesta0_.pkey=adtesta0_16_.pkey
           left outer join ADTestR adtesta0_17_ on adtesta0_.pkey=adtesta0_17_.pkey
           left outer join ADTestS adtesta0_18_ on adtesta0_.pkey=adtesta0_18_.pkey
           left outer join ADTestT adtesta0_19_ on adtesta0_.pkey=adtesta0_19_.pkey
           left outer join ADTestU adtesta0_20_ on adtesta0_.pkey=adtesta0_20_.pkey
           left outer join ADTestV adtesta0_21_ on adtesta0_.pkey=adtesta0_21_.pkey
           left outer join ADTestW adtesta0_22_ on adtesta0_.pkey=adtesta0_22_.pkey
           left outer join ADTestX adtesta0_23_ on adtesta0_.pkey=adtesta0_23_.pkey
           left outer join ADTestY adtesta0_24_ on adtesta0_.pkey=adtesta0_24_.pkey
           left outer join ADTestZ adtesta0_25_ on adtesta0_.pkey=adtesta0_25_.pkey
 where adtesta0_.pkey=1

That's not very pretty, and does correspond to the effective simulation of the per-hierarchy table that I hoped could be avoided.

So it looks like these kind of queries are going to be very expensive. I'll have a think about how often they would be needed (compared, say, to knowing that I want an instance of ADTestP and asking for one of those right off the bat which only joins in the required parent tables). I have a feeling however that this will be unavoidable with references from othe entities; in other words, a one-to-one mapping from a field of type ADTestA is always going to involve exactly this kind of lookup.

(On the other hand the alternative strategies aren't shining beacons of hope either; going the table-per-hierarchy route and having literally hundreds of columns in a single table doesn't sound very efficient either...)

Andrzej Doyle 2010-01-07 15:23:21

+1 for feeding back - thanks.

David M 2010-01-07 16:44:33

Answer 3

A:

As long as you access your DB only through Hibernate and you either dont have important data or are ready to write a small migration script, you should be able to make the decision about table per subclass / hierarchy quite late in your development process. That's the beauty of an ORM, it abstract the database structure...

On the other hand, I'm a big fan of "prefer composition over inheritance" (http://stackoverflow.com/questions/49002/prefer-composition-over-inheritance) and I am quite dubious that a model with 70 classes over 4-5 levels cannot be simplified ... but I'll let you think for yourself over that one, after all I dont know what probem you are trying to sovle.

Guillaume 2010-01-07 15:48:50

Unfortunately the tables will *primarily* be accessed through Hibernate, but the design needs to be at least sane for other non-Java tools to access occasionally. And I may be confusing the relevance of concerns here, but the class hierarchy is being refactored at the same time that a new table is introduced, with DB performance being one of the primary motivators. I agree with the abstraction issue to some extent but as Joel wrote abstractions leak; if it's not possible to do this in a performant way in Hibernate, I ideally need to discover that as early as possible.

Andrzej Doyle 2010-01-07 16:41:23

Oh - and I did consider writing a disclaimer about the number of subclasses being legitimate because I thought someone would highlight it as a smell. :-) Continuing the Animal analogy, I'm writing code for something akin to a zoo or vet so we legitimately have an `Animal` > `Vertebrate` > `Mammal` > `Carnivore` > `Canine` > `Dog` type hierarchy, for many different animals (and yes, there's useful behaviour at each level of the hierarchy). The Java code is much nicer and cleaner with this class structure than with the almost-flat one it replaces.

Andrzej Doyle 2010-01-07 16:50:18

ansaurus

tags:

views:

answers:

Efficiency of Hibernate's table-per-subclass inheritance strategy

related questions