views:

97

answers:

5

I'm in the process of converting a class hierarchy to be stored in an SQL database.

Original pseudo code:

abstract class Note
{
   int id;
   string message;
};

class TimeNote : public Note
{
   time_t time;
};

class TimeRangeNote : public Note
{
   time_t begin;
   time_t end;
};

class EventNote : public Note
{
   int event_id;
};

// More classes deriving from Note excluded.

Currently I'm having a couple of ideas how to store this in a database.

A. Store all notes in a single wide table

The table would contain all information needed by all classes deriving from Note.

CREATE TABLE t_note(
   id INTEGER PRIMARY KEY,
   message TEXT,
   time DATETIME,
   begin DATETIME,
   end DATETIME,
   event_id INTEGER
);

Future classes deriving from Note need to add new columns to this table.

B. Map each class to a table

CREATE TABLE t_note(
   id INTEGER PRIMARY KEY,
   message TEXT
);

CREATE TABLE t_timenote(
   note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
   time DATETIME
);

CREATE TABLE t_timerangenote(
   note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
   begin DATETIME,
   end DATETIME
);

CREATE TABLE t_eventnote(
   note_id INTEGER PRIMARY KEY REFERENCES t_note(id),
   event_id INTEGER
);

Future classes deriving from Note need to create a new table.

C. Use database normalization and VARIANT/SQL_VARIANT

CREATE TABLE t_note(
   id INTEGER PRIMARY KEY,
   message TEXT
);

CREATE TABLE t_notedata(
   note_id INTEGER REFERENCES t_note(id),
   variable_id TEXT, -- or "variable_id INTEGER REFERENCES t_variable(id)".
                     -- where t_variable has information of each variable.
   value VARIANT
);

Future classes deriving from Note need to add new variable_id.

D. Map each concrete class to a table (newly added based on current answers)

CREATE TABLE t_timenote(
   id INTEGER PRIMARY KEY,
   message TEXT,
   time DATETIME
);

CREATE TABLE t_timerangenote(
   id INTEGER PRIMARY KEY,
   message TEXT,
   begin DATETIME,
   end DATETIME
);

CREATE TABLE t_eventnote(
   id INTEGER PRIMARY KEY,
   message TEXT,
   event_id INTEGER
);

Future classes deriving from Note need to create a new table.


What would be the most logical representation in SQL be?
Are there any better options?

+3  A: 

In general I prefer obtion "B" (i.e. one table for base class and one table for each "concrete" subclass).

Of course this has a couple of drawbacks: first of all you have to join at least 2 tables whenever you have to read a full instance of a subclass. Also, the "base" table will be constantly accessed by anyone who has to operate on any kind of note.

But this is usually acceptable unless you have extreme cases (billions of rows, very quick response times required and so on).

There is a third possible option: map each subclass to a distinct table. This helps partitioning your objects but costs more in development effort, in general.

See this for a complete discussion.

(Regarding your "C" solution, using VARIANT: I can't comment on the merits/demerits, because it looks like a proprietary solution - what is it ? Transact-SQL? and I am not familiar with it).

p.marino
`sql_variant` is a data type of Microsoft SQL Server. I'd prefer not to use it.
dalle
I'm leaning towards option B.Mapping each subclass to a distinct table sounds interesting. One drawback is that it will prevent other tables in the database from referencing to any note (i.e. REFERENCES t_note(id)). Another drawback is that it, as you say, costs more in development effort.
dalle
Very good article. Thanks.
dalle
+1 Most *logical* thing is B. Most logical because conceptually it is the most correct approach; the rest is acceptable as physical design, when you denormalize database design to achieve better performance in certain scenarios (in another words, the fact that you are mapping objects does not really make it a specific problem, it falls under general database denormalization).
Unreason
A: 

I'd grativate towards option A myself.

It also depends a bit on your usage scenarios, for example will you need to do lots of searches across all types of notes? If yes, then you might be better off with option A.

You can always store them as option A (one big table) and create Views for the different sub-notes if you so please. That way, you can still have a logical seperation while having good searchability.

Generally speaking, but this might be close to a religious discussion so beware, I believe that a relational database should be a relational database and not try to mimic an OO structure. Let your classes do the OO stuff, let the db be relational. There are specific OO databases available if you want to extend this to your datastore. It does mean that you have to cross the 'Object-relational impedance mismatch' as they call it, but again there are ORM mappers for that specific purpose.

Sam
+2  A: 

Your 'B' option as described is pretty much an implementation of the 'Object Subclass Heirarchy' (Kung, 1990 http://portal.acm.org/citation.cfm?id=79213)

As such, it's a well established and understood method. It works quite well. It's also extensible through multiple levels of inheritance, should you need it.

Of course you lose some of the benefits of encapsulation and information hiding, if you don't restrict who can access the data theough the DBMS interface.

You can however access it from multiple systems, and even languages, simultaneously (e.g Java, C++, C#) (This was the subject of my Masters dissertation :)

Ragster
A: 

There's a series of patterns collectively known as "Crossing Chasms" I've used for many years. Don't let the references to Smalltalk throw you - it's applicable to any object oriented language. Try the following references:

A Pattern Language for Relational Databases and Smalltalk
Crossing Chasms - The Static Patterns
Crossing Chasms - The Architectural Patterns

Share and enjoy.

Bob Jarvis
+1  A: 

You've hit the 3 most-commonly-accepted ways of modeling objects into a relational database. All 3 are acceptable, and each has their own pros and cons. Unfortunately, that means there's no cut-n-dry "right" answer. I've implemented each of those at different times, and here's a couple notes/caveats to keep in mind:

Option A has the drawback that, when you add a new subclass, you must modify an existing table (this may be less palatable to you than adding a new table). It also has the drawback that many columns will contain NULLs. However, modern DBs seem MUCH better at managing space than older DBs, so I've never been too worried about nulls. One benefit is that none of your search or retrieve operations will require JOINs or UNIONs, which means potentially better performance and simpler SQL.

Option B has the drawback that, if you add a new property to your superclass, you need to add a new column to each and every subclass's table. Also, if you want to do a heterogeneous search (all subclasses at once), you must do so using a UNION or JOIN (potentially slower performance and/or more complex sql).

Option C has the drawback that all retrieval operations (even for just one subclass) will involve a JOIN, as will most searches. Also, all inserts will involve multiple tables, making for somewhat more complex SQL, and will necessitate use of transactions. This option seems to be the most "pure" from a data-normalization standpoint, but I rarely use it because the JOIN-for-every-operation drawback usually makes one of the other options more palatable.

mikemanne