views:

99

answers:

3

I was wondering about this. Let's say I need to store in a datawarehouse the data for several measures vs time:

t |  x'
-------
1 |  20
2 |  50
3 |  30


t |  x''
-------
3 |  23
4 |  56
6 |  28

and so on..

t |  x''n
-------
5 |  35
6 |  92
7 |  23

If I need to build some large fact table composing the previous data in ways not yet defined, what can be more efficient (in whatever sense), to have a large table storing everything or to have separate tables like I depicted?

t |  x' |  x''
----------------
1 |  20 |
2 |  50 |
3 |  30 | 23   ...
4 |     | 56
5 |     | 28
6 |     | 
7 |     |
+1  A: 

Do you need to use/display all the results at once? If so, it would be more efficient to grab them all together, and for this I would go with a single table :-)

IrishChieftain
+3  A: 

Use one fact table. The time is a dimension of the fact table. If you have overlaps as you've shown, that means you need another dimension.

Bill Karwin
+5  A: 

If you're talking about having a dynamic number of columns (i.e. adding a new column each time you add another x''n), this is really not the relational database way of doing things. Adding a column to a large table is a very inefficient operation. Depending on your RBDMS, it may actually copy and recreate the entire table when you do that. Dynamically creating new tables is faster but still not as fast as inserting rows which is where relational databases really perform at their best. Basically what I'm saying is that you want your database schema to be static (or rarely changing). All the dynamic operations should be purely row based.

Perhaps what you really want is one table something like this:

t  | x  | prime
---+----+------
1  | 20 | 1
2  | 50 | 1
3  | 30 | 1
3  | 28 | 2
4  | 56 | 2
6  | 28 | 2
5  | 35 | 3
6  | 92 | 3
7  | 23 | 3

Be sure to create indexes on the columns that will appear in WHERE clauses in your queries (or maybe some strategic compound indexes depending on how exactly you will query the table). Also, it's good practice to have a primary key column as the first column of every table because it will give you unique handles on individual rows if you need to update or delete them. I've left off the primary key in my above sample for simplicity in illustrating my point.

Asaph