views:

358

answers:

3

I am working on an application which accepts any uploaded CSV data, stores it alongside other datasets which have been uploaded previously, and then produces output (CSV or HTML) based on the user selecting which columns/values they want returned. The database will be automatically expanded to handle new/different columns and datatypes as required. This is in preference to a entity-attribute-value model.

Example - uploading these 2 sets to a blank database:

dataset A:

name  | dept  | age   
------+-------+------
Bob   | Sales | 24
Tim   | IT    | 32

dataset B:

name  | dept  | age  | salary
------+-------+------+--------
Bob   | Sales | 24   | £20,000
Tim   | IT    | 32   | £20,000

Will programatically change the 'data' table so that importing dataset A results in 3 newly created columns (name,dept,age). Importing dataset B results in 1 newly created column (salary). At the moment, forget about whether the recordsets should be combined or not and that there's no normalisation.

The issue I have is that some columns will also have lookup values - let's say that the Dept column will at some point in the future have associated values which give the address and phone numbers of that department. The same could be true for the Salary column, looking up tax groupings etc.

The number of columns in this big table should not become too high (a few hundred) but will be high enough to want the user to administer the lookup table structure and values through an admin panel rather than have to involve developers each time.

The question is whether to use individual lookup tables for each column (value, description), or a combined lookup table which references the column (column, value, description). Normally I would opt for individual lookup tables, but here the application will need to create them automatically (e.g. lookup_dept, lookup_salary) and then add a new join into the master SQL statement. This would be done at the request of the user rather than when the column's added (to avoid hundreds of empty tables).

The combined lookup table on the other hand would need to be joined multiple times onto the data table, selecting on the column name each time.

Individual lookups seems to make sense to me but I may be barking up completely the wrong tree.

A: 

I would agree that individual tables is preferable. It is more scalable and better for query optimisation. Also, if in future the users want more columns on a particular lookup then you can add them.

Yes, the application will have to create tables and constraints automatically: I wouldn't normally do this, but then this application is already altering existing tables and adding columns to them, which I wouldn't normally do either!

Tony Andrews
A: 

Ah, the "One true lookup table" idea. One of the rare times I agree with Mr Celko. Google search too

Individual tables every time. It's "correct" in the database sense.

My reason (no normalisation pedants please): each row in a table stores one entity only. eg Fruit names, car makes, phone brands. To mix them is nonsense. I could have a phone brand called "Apple". Er... wait a minute...

gbn
A: 

You said,

This is in preference to a entity-attribute-value model.

But it looks to me like that is exactly what you need.

Consider using an RDF triplestore, and query it with SPARQL.

Forget SQL, this is a job for RDF.

Fenugreek Femtosecond