views:

422

answers:

5

I am designing a new laboratory database with MANY types of my main entities.

The table for each entity will hold fields common to ALL types of that entity (entity_id, created_on, created_by, etc). I will then use concrete inheritance (separate table for each unique set of attributes) to store all remaining fields.

I believe that this is the best design for the standard types of data which come through the laboratory daily. However, we often have a special samples which often are accompanied by specific values the originator wants stored.

Question: How should I model special (non-standard) types of entities?

Option 1: Use entity-value for special fields
One table (entity_id, attribute_name, numerical_value) would hold all data for any special entity.
+ Fewer tables.
- Cannot enforce requiring a particular attribute.
- Must convert (pivot) rows to columns which is inefficient.

Option 2: Strict concrete inheritance.
Create separate table for each separate special case.
+ Follows in accordance with all other rules
- Overhead of many tables with only a few rows.

Option 3: Concrete inheritance with special tables under a different user.
Put all special tables under a different user.
+ Keeps all special and standard tables separate.
+ Easier to search for common standard table in a list without searching through all special tables.
- Overhead of many tables with only a few rows.

+7  A: 

Actually the design you described (common table plus subtype-specific tables) is called Class Table Inheritance.

Concrete Table Inheritance would have all the common attributes duplicated in the subtype tables, and you'd have no supertype table as you do now.

I'm strongly against EAV. I consider it an SQL antipattern. It may seem like an elegant solution because it requires fewer tables, but you're setting yourself up for a lot of headache later. You identified a couple of the disadvantages, but there are many others. IMHO, EAV is used appropriately only if you absolutely must not create a new table when you introduce a new subtype, or if you have an unbounded number of subtypes (e.g. users can define new attributes ad hoc).

You have many subtypes, but still a finite number of them, so if I were doing this project I'd stick with Class Table Inheritance. You may have few rows of each subtype, but at least you have some assurance that all rows in each subtype have the same columns, you can use NOT NULL if you need to, you can use SQL data types, you can use referential integrity constraints, etc. From a relational perspective, it's a better design than EAV.

One more option that you didn't mention is called Serialized LOB. That is, add a BLOB column for a semi-structured collection of custom attributes. Store XML, YAML, JSON, or your own DSL in that column. You won't be able to parse individual attributes out of that BLOB easily with SQL, you'll have to fetch the whole BLOB back into your application and extract individual attributes in code. So in some ways it's less convenient. But if that satisfies your usage of the data, then there's nothing wrong with that.

Bill Karwin
Actually, my design is a hybrid of class and concrete. It uses class level for fields common to all types and concrete level for all remaining fields (many of which are common to several types).
Steven
I might not be following what you're saying, but it sounds like you're describing Class Table Inheritance exactly.
Bill Karwin
My design would have exactly two levels. The main level (one table) holds fields common to ALL types. The second level (many tables) has one table for each type, period. Even though several types may have some fields in common, I have no third (or higher) level, which would be required for the inheritance to be fully class-level.
Steven
Aha, I see. So whereas in a true OO hierarchy you *should* have more than two levels, but you're compromising this in the DB design for the sake of simplicity. I see now why you consider this a hybrid, thanks.
Bill Karwin
If new subtypes are constantly being introduced or if existing subtypes structures are changed often, class table inheritance is probably inappropriate.
hythlodayr
+1  A: 

Hi Steven,

I think it depends mostly on how you want to use this data.

First of all, I don't really see the benefit of option 3 over option 2. I think separating the special tables in another schema will make your application harder to maintain, especially if later on commonalities are found between 'special values'.

As another option I would say: - Store the special values in an XML fragment (or blob). Most databases have ability to query on XML structures these days, so without the need for many extra tables, you would keep your flexibility for a small performance hit.

If you put all the special values in one table, you get a very sparse table. Most normal DBMSes cannot handle this very well, but there are some implementations that specialize in this. You could benefit from that.

Do you often need to query the key-value pairs? if you basically access that table through it's entry_id, I think having a key-value table is not a bad design. An extra index on the kay column might even help you when you do need to query for special values. If you build an application layer on top of your database, the key-value table will map on a Map or Hash structure, which can also easily be used.

It also depends on the different types of values you want to store. If there are many different types, that need to be easily accessed (instead of being serialized/deserialized to XML/Character-String) you might want to store the type in a separate column, but that will usually lead to a very complicated design.

Hope this helps (a little bit).

-Maarten

Maarten Winkels
Why shouldn't Oracle be able to deal with sparsely filled tables? According to this site http://highscalability.com/stack-overflow-architecture they can.
tuinstoel
+1  A: 

http://en.wikipedia.org/wiki/Entity-Attribute-Value%5Fmodel

Suggest you read about the problems with entity value tables before deciding to use them.

HLGEM
+1  A: 

Oracle can deal with sparsely filled tables quite well. I think you can use a similar approach as company salesforce uses. They use tables with a lot of columns, they create columns when needed. You can index those columns much better than an eav model.

So it is flexible but it performs better than an eav model.

Read: Ask Tom 1, Ask Tom 2, High Scalabilty and SalesForce.

tuinstoel
+1  A: 

The "Option 1" patterns is also called the "Universal Relation" At first look it seems like a short cut to not doing potentially difficult data modeling. It trades effortless data modeling for not being able to do simple select, update, delete without dramatically more effort than it would take on more usual looking data model with multiple tables.

MatthewMartin