views:

159

answers:

6

I am trying to design a schema where the columns of a table are not fixed. Ex: I have an Employee table where the columns of the table are not fixed and vary (attributes of Employee are not fixed and vary). Frequent addition of a new attribute / column is requirement.

  1. Nullable columns in the Employee table itself i.e. no normalization

  2. Instead of adding nullable columns, separate those columns out in their individual tables ex: if Address is a column to be added then create table Address[EmployeeId, AddressValue].

  3. Create tables ExtensionColumnName [EmployeeId, ColumnName] and ExtensionColumnValue [EmployeeId, ColumnValue]. ExtensionColumnName would have ColumnName as "Address" and ExtensionColumnValue would have ColumnValue as address value.

    Employee table
    EmployeeId
    Name

    ExtensionColumnName table
    ColumnNameId
    EmployeeId
    ColumnName

    ExtensionColumnValue table
    EmployeeId
    ColumnNameId
    ColumnValue

There is a drawback is the first two ways as the schema changes with every new attribute. Note that adding a new attribute is frequent and a requirement.

I am not sure if this is the good or bad design. If someone had a similar decision to make, please give an insight on things like foreign keys / data integrity, indexing, performance, reporting etc.

A: 

Combine your ExtensionColumn tables into one

Property:
    EmployeeID foreign key
    PropertyName string
    PropertyValue string

If you use a monotonic sequence for assigning primary keys in all your object tables then a single property table can hold properties for all objects.

Jim Garrison
+2  A: 

I recommend using a combination of numbers two and three. Where possible, model tables for standard associations like addresses. This is the most ideal approach...

But for constantly changing values that can't be summarized into logical groupings like that, use two tables in addition to the EMPLOYEES table:

  • EMPLOYEE_ATTRIBUTE_TYPE_CODES (two columns, employee_attribute_type_code and DESCRIPTION)
  • EMPLOYEE_ATTRIBUTES (three columns: employee_id foreign key to EMPLOYEES, employee_attribute_type_code foreign key to EMPLOYEE_ATTRIBUTE_TYPE_CODES, and VALUE)

In EMPLOYEE_ATTRIBUTES, set the primary key to be made of:

  • employee_id
  • employee_attribute_type_code

This will stop duplicate attributes to the same employee.

OMG Ponies
OMG Ponies, I came up with the schema eventually. I am interested in issues with such schema. For ex: if the VALUE column in EMPLOYEE_ATTRIBUTES is an ID (primary key of some other table), then it becomes an issue. That can be solved with having a separate meta table denoting that such attribute is a lookup and corresponsing lookup table name. This would involve some dynamic sql or reflection and I lose type safety.
hIpPy
A: 

There is a pattern, called observation pattern.

For explanation, see these questions/answers: one, two, three.

In general, looks like this:

alt text

For example, subjects employee, company and animal can all have observation Name (trait), subjects employee and animal can have observation Weight (measurement) and subject beer bottle can have observations Label (trait) and Volume (measurement). It all fits in the model.

Damir Sudarevic
+1  A: 

It might be useful to look at the current crop of NoSQL databases which allow you to store arbitrary sets of key-value pairs per record.

I would recommend you look at couchdb, mongodb, lucene, etc ...

If the schema changes often in an SQL database this ends up in a nightmare, especially with reporting.

Putting everything in (rowId, key, value) triads is flexible, but slower because of the huge number of records.

The way the ERP vendors do it is just make their schema of the fields they're sure of and add a largisch number of "flexfields" (i.e. 20 numbers, 20 strings, etc) in fixed named columns and use a lookup table to see which flexcolumn corresponds to what. This allows some flexibility for the future while essentially having a static schema.

Peter Tillemans
OMG Ponies
I was using the term "Big Boys" rather loosely with the tongue in cheek while thinking about to 10000+ table model in Oracle EBS. Every other table is filled with those placeholder columns. Personally I find it very ugly, but apparently they have bee surviving on that for 20+ years. I guess it is one of those ugly, pragmatic solutions which work ok in most cases.
Peter Tillemans
A: 

If, as you say, new attributes will be added frequently, an EAV data model may work well for you.

Joe Stefanelli
A: 

I would use a combination of 1 and 2. If you are adding attributes frequently, I don't think you have a handle on the data requirements.

I supect some of the attributes being added belong in a another table. If you keep adding attribututes like java certified, asp certified, ..., then you need a certification table. This can be relationship to a certifications code table listing available certifications.

Attributes like manager may be either an attribute or relationship table. If you have multiple relationships between employees, then consider a relationship table with a releation type. Organizations with a matrix management structure will require a releationship table.

Addresses and phone numbers often go in separate tables. An address key like employee_id, address_type would be appropriate. If history is desired add a start_date column to the key.

If you are keeping history I recommend using start_date and end_date columns on the appropriate columns. I try to use a relationship where the record is active when 'start_date <= date-being-considered < end_date' is true. Attributes like weight, eye color, etc.

BillThor