views:

1653

answers:

8

What are the best practices for modeling inheritance in databases?

What are the trade-offs (e.g. queriability)?

(I'm most interested in SQL Server and .NET, but I also want to understand how other platforms address this issue.)

+2  A: 

You would normalize of your database and that would actually mirror your inheritance. It might have performance degradance, but that's how it is with normalizing. You probably will have to use good common sense to find the balance.

Per Hornshøj-Schierbeck
why do people believe that normalizing a database degrades performance? do people also think that the DRY principle degrades code performance? where does this misperception come from?
Steven A. Lowe
Possibly because denormalising can improve performance, hence normalising degrades it, relatively speaking. Can't say I agree with it, but that's probably how it came about.
Matthew Scharley
At the start, normalisation might have a small effect on performance, but over time, as the number of rows increases, efficient JOINs will begin to outperform the bulkier tables. Of course, normalisation has other, greater benefits - consistency and lack of redundancy, etc.
Rob
+5  A: 
mattlant
I would also consider Target per Concrete class to not really model inheritance well and so i did not show.
mattlant
Could you add a reference where the illustration is from?
chryss
+22  A: 

There are several ways to model inheritance in a database. Which you choose depends on your needs. Here are a few options:

Table-Per-Type (TPT)

Each class has its own table. The base class has all the base class elements in it, and each class which derives from it has its own table, with a primary key which is also a foreign key to the base class table; the derived table's class contains only the different elements.

So for example:

class Person {
    public int ID;
    public string FirstName;
    public string LastName;
}

class Employee : Person {
    public DateTime StartDate;
}

Would result in tables like:

table Person
------------
int id (PK)
string firstname
string lastname

table Employee
--------------
int id (PK, FK)
datetime startdate

Table-Per-Hierarchy (TPH)

There is a single table which represents all the inheritance hierarchy, which means several of the columns will probably be sparse. A discriminator column is added which tells the system what type of row this is.

Given the classes above, you end up with this table:

table Person
------------
int id (PK)
int rowtype (0 = "Person", 1 = "Employee")
string firstname
string lastname
datetime startdate

For any rows which are rowtype 0 (Person), the startdate will always be null.

Table-Per-Concrete (TPC)

Each class has its own fully formed table with no references off to any other tables.

Given the classes above, you end up with these tables:

table Person
------------
int id (PK)
string firstname
string lastname

table Employee
--------------
int id (PK)
string firstname
string lastname
datetime startdate
Brad Wilson
'Which you choose depends on your needs' - please elaborate, as I think the reasons for choices form the core of the question.
Alex
As with most things in a database, you're trading off storage cost vs. performance. TPH stores everything in one table, so selects are fast. TPT is more compact than TPC, but more expensive because of the joins. Finding things in TPC means checking multiple tables. Each of these is a tradeoff, and your decision has to be based on what's most important to your application.
Brad Wilson
+1  A: 

repeat of similar thread answer

in O-R mapping, inheritance maps to a parent table where the parent and child tables use the same identifier

for example

create table Object (
    Id int NOT NULL --primary key, auto-increment
    Name varchar(32)
)
create table SubObject (
    Id int NOT NULL  --primary key and also foreign key to Object
    Description varchar(32)
)

SubObject has a foreign-key relationship to Object. when you create a SubObject row, you must first create an Object row and use the Id in both rows

EDIT: if you're looking to model behavior also, you would need a Type table that listed the inheritance relationships between tables, and specified the assembly and class name that implemented each table's behavior

seems like overkill, but that all depends on what you want to use it for!

Steven A. Lowe
That discussion ended up being about adding a couple columns to every table, not about modeling inheritance. I think the title of that discussion should be changed to better reflect the nature of the question and discussion.
Even Mien
+1  A: 

Using SQL ALchemy (Python ORM), you can do two types of inheritance.

The one I've had experience is using a singe-table, and having a discriminant column. For instances, a Sheep database (no joke!) stored all Sheep in the one table, and Rams and Ewes were handled using a gender column in that table.

Thus, you can query for all Sheep, and get all Sheep. Or you can query by Ram only, and it will only get Rams. You can also do things like have a relation that can only be a Ram (ie, the Sire of a Sheep), and so on.

Matthew Schinckel
+1  A: 

Note that some database engines already provides inheritance mechanisms natively like Postgres. Look at the documentation.

For an example, you would query the Person/Employee system described in a response above like this:

  /* This shows the first name of all persons or employees */
  SELECT firstname FROM Person ; 

  /* This shows the start date of all employees only */
  SELECT startdate FROM Employee ;

In that is your database's choice, you don't need to be particularly smart !

Pierre
+5  A: 

Proper database design is nothing like proper object design.

If you are planning to use the database for anything other than simply serializing your objects (such as reports, querying, multi-application use, business intelligence, etc.) then I do not recommend any kind of a simple mapping from objects to tables.

Many people think of a row in a database table as an entity (I spent many years thinking in those terms), but a row is not an entity. It is a proposition. A database relation (i.e., table) represents some statement of fact about the world. The presence of the row indicates the fact is true (and conversely, it's absence indicates the fact is false).

With this understanding, you can see that a single type in an object-oriented program may be stored across a dozen different relations. And a variety of types (united by inheritance, association, aggregation, or completely unaffiliated) may be partially stored in a single relation.

It is best to ask yourself, what facts do you want to store, what questions are you going to want answers to, what reports do you want to generate.

Once the proper DB design is created, then it is a simple matter to create queries/views that allow you to serialize your objects to those relations.

Jeffrey L Whitledge
+2  A: 

Short answer: you don't.

If you need to serialize your objects, use an ORM, or even better something like activerecord or prevaylence.

If you need to store data, store it in a relational manner (being careful about what you are storing, and paying attention to what Jeffrey L Whitledge just said), not one affected by your object design.

Marcin
+1 Attempting to model inheritance in a database is a waste of good, relational resources.
Daniel Spiewak