views:

40

answers:

4

Hello:

I have a collection of physical parameters associated with different items. For example:

Item, p1, p2, p3
a,     1,  2,  3
b,     4,  5,  6
[...]

where px stands for parameter x.

I could go ahead and store the database exactly as presented; the schema would be

CREATE TABLE t1 (item TEXT PRIMARY KEY, p1 FLOAT, p2 FLOAT, p3 FLOAT);

I could retrieve the parameter p1 for all the items with the statement:

SELECT p1 FROM t1;

A second alternative is to have an schema like:

CREATE TABLE t1 (id INT PRIMARY KEY, item TEXT, par TEXT, val FLOAT)

This seems much simpler if you have many parameters (as I do). However, the parameter retrieval seems very awkward:

SELECT val FROM t1 WHERE par == 'p1'

What do you advice? Should go for the "pivoted" (first) version or the id, par, val (second) version?

Many thanks.

EDIT

For reference, I found the following persistence pattern in the SQLAlchemy examples site (the vertical mapping):

"""Mapping a vertical table as a dictionary.

This example illustrates accessing and modifying a "vertical" (or
"properties", or pivoted) table via a dict-like interface.  These are tables
that store free-form object properties as rows instead of columns.  For
example, instead of::

  # A regular ("horizontal") table has columns for 'species' and 'size'
  Table('animal', metadata,
        Column('id', Integer, primary_key=True),
        Column('species', Unicode),
        Column('size', Unicode))

A vertical table models this as two tables: one table for the base or parent
entity, and another related table holding key/value pairs::

  Table('animal', metadata,
        Column('id', Integer, primary_key=True))

  # The properties table will have one row for a 'species' value, and
  # another row for the 'size' value.
  Table('properties', metadata
        Column('animal_id', Integer, ForeignKey('animal.id'),
               primary_key=True),
        Column('key', UnicodeText),
        Column('value', UnicodeText))

Because the key/value pairs in a vertical scheme are not fixed in advance,
accessing them like a Python dict can be very convenient.  The example below
can be used with many common vertical schemas as-is or with minor adaptations.

"""
+1  A: 

It's a trade-off of performance vs. extensibility. If you never intend to add any more px columns, I think you are safe with using the first approach, however, if you anticipate that there will be more px columns in the future, you might want to do the name/value approach.

The name-value approach can get nasty for performance if you end up with a lot of data. Your queries will be faster using the static column approach.

You could also do a hybrid where you start with the static column approach, and add support for an "extension" table that keys off the Item, so that you can add additional properties in the future.

Andy White
They mentioned "many parameters (as I do)"
Dustin Laine
Yep, so it's a trade-off. If they want better performance, sometimes you have to de-normalize.
Andy White
+1  A: 

The normalized route would be to place the pX values in a table reference by ID.

ID     Item
1      a 
2      b 
3      c 


ID     Item    P
1      1       1
2      1       2
3      1       3
Dustin Laine
+1  A: 

If you are assuming that px will grow beyond three values (p1, p2 and p3) onto p4 and so on then the first approach is going to fail, and keeping adding on columns for p4, p5 and so on seems a flawed approach.

To be honest the approach that attracts me would be to separate the item and the parameters into different tables and then use a link entity to join them:

Item
  |
-----
| | |
ItemParameter

| | |
-----
  |
Parameter

So that an item can have many parameters and a parameter can exist for many items.

So Item a can have Parameters p1, p2 and p3

(Item)
a
(ItemParameter)
a p1
a p2
a p2
(Parameter)
p1
p2
p3

Or Item b can have Parameters p1, p2, p6, p10 and p19

(Item)
b
(ItemParameter)
b p1
b p2
b p6
b p10
b p19
(Parameter)
p1
p2
p6
p10
p19

and so on

amelvin
+1  A: 

In addition to the flexibility of the second approach, a further advantage is that parameters can then be rows in a parameter table, storing data about that parameter as part of the database, rather than as columns of the schema. It also leads naturally to an RDF triple representation of the data.

BTW you don't need the added key field, make item and par a joint primary key

CREATE TABLE t1 ( item TEXT, par TEXT, val FLOAT, PRIMARY KEY (item, par))

One limitation of the second approach is that the datatype of value must be the same for all parameters - OK if all floats but for generality this might have to be string with attendant loss of validation and the need for programatic data conversion.

Query speed will be affected, but you can get all the parameters for a term with a query like

SELECT par,value FROM t1 WHERE item='qitem'

which is easier to transform to a presentation format than the alternative.

Chris Wallace
Your definition of Primary Key will be really helpful. Thanks.
Arrieta