ansaurus

Question

Designing SQL database to represent OO class hierarchy

Answer 1

+3 A:

In general I prefer obtion "B" (i.e. one table for base class and one table for each "concrete" subclass).

Of course this has a couple of drawbacks: first of all you have to join at least 2 tables whenever you have to read a full instance of a subclass. Also, the "base" table will be constantly accessed by anyone who has to operate on any kind of note.

But this is usually acceptable unless you have extreme cases (billions of rows, very quick response times required and so on).

There is a third possible option: map each subclass to a distinct table. This helps partitioning your objects but costs more in development effort, in general.

See this for a complete discussion.

(Regarding your "C" solution, using VARIANT: I can't comment on the merits/demerits, because it looks like a proprietary solution - what is it ? Transact-SQL? and I am not familiar with it).

p.marino 2010-08-05 09:49:43

`sql_variant` is a data type of Microsoft SQL Server. I'd prefer not to use it.

dalle 2010-08-05 09:58:10

I'm leaning towards option B.Mapping each subclass to a distinct table sounds interesting. One drawback is that it will prevent other tables in the database from referencing to any note (i.e. REFERENCES t_note(id)). Another drawback is that it, as you say, costs more in development effort.

dalle 2010-08-05 10:05:26

Very good article. Thanks.

dalle 2010-08-05 11:08:50

+1 Most *logical* thing is B. Most logical because conceptually it is the most correct approach; the rest is acceptable as physical design, when you denormalize database design to achieve better performance in certain scenarios (in another words, the fact that you are mapping objects does not really make it a specific problem, it falls under general database denormalization).

Unreason 2010-08-05 12:10:30

Answer 2

A:

I'd grativate towards option A myself.

It also depends a bit on your usage scenarios, for example will you need to do lots of searches across all types of notes? If yes, then you might be better off with option A.

You can always store them as option A (one big table) and create Views for the different sub-notes if you so please. That way, you can still have a logical seperation while having good searchability.

Generally speaking, but this might be close to a religious discussion so beware, I believe that a relational database should be a relational database and not try to mimic an OO structure. Let your classes do the OO stuff, let the db be relational. There are specific OO databases available if you want to extend this to your datastore. It does mean that you have to cross the 'Object-relational impedance mismatch' as they call it, but again there are ORM mappers for that specific purpose.

Sam 2010-08-05 09:58:09

Answer 3

+2 A:

Your 'B' option as described is pretty much an implementation of the 'Object Subclass Heirarchy' (Kung, 1990 http://portal.acm.org/citation.cfm?id=79213)

As such, it's a well established and understood method. It works quite well. It's also extensible through multiple levels of inheritance, should you need it.

Of course you lose some of the benefits of encapsulation and information hiding, if you don't restrict who can access the data theough the DBMS interface.

You can however access it from multiple systems, and even languages, simultaneously (e.g Java, C++, C#) (This was the subject of my Masters dissertation :)

Ragster 2010-08-05 10:00:24

Answer 4

A:

There's a series of patterns collectively known as "Crossing Chasms" I've used for many years. Don't let the references to Smalltalk throw you - it's applicable to any object oriented language. Try the following references:

A Pattern Language for Relational Databases and Smalltalk
Crossing Chasms - The Static Patterns
Crossing Chasms - The Architectural Patterns

Share and enjoy.

Bob Jarvis 2010-08-05 11:46:13

Answer 5

+1 A:

You've hit the 3 most-commonly-accepted ways of modeling objects into a relational database. All 3 are acceptable, and each has their own pros and cons. Unfortunately, that means there's no cut-n-dry "right" answer. I've implemented each of those at different times, and here's a couple notes/caveats to keep in mind:

Option A has the drawback that, when you add a new subclass, you must modify an existing table (this may be less palatable to you than adding a new table). It also has the drawback that many columns will contain NULLs. However, modern DBs seem MUCH better at managing space than older DBs, so I've never been too worried about nulls. One benefit is that none of your search or retrieve operations will require JOINs or UNIONs, which means potentially better performance and simpler SQL.

Option B has the drawback that, if you add a new property to your superclass, you need to add a new column to each and every subclass's table. Also, if you want to do a heterogeneous search (all subclasses at once), you must do so using a UNION or JOIN (potentially slower performance and/or more complex sql).

Option C has the drawback that all retrieval operations (even for just one subclass) will involve a JOIN, as will most searches. Also, all inserts will involve multiple tables, making for somewhat more complex SQL, and will necessitate use of transactions. This option seems to be the most "pure" from a data-normalization standpoint, but I rarely use it because the JOIN-for-every-operation drawback usually makes one of the other options more palatable.

mikemanne 2010-08-06 20:16:49

ansaurus

tags:

views:

answers:

Designing SQL database to represent OO class hierarchy

related questions