views:

204

answers:

4

Database design question for y'all. I have a form (like, the paper kind) that has several entry points for data. This form has changed, and is expected to change over years. It is being turned into a computer app, so that we can, among other things, quit wasting paper. (And minor things, like have all the data in one central store that can be queried, etc.) I'd like to store all of the forms data in a database, and have it be pretty agnostic as to the changes.

Originally, I was just considering each field to be a string -- and I had a table something like this:

FormId int (FK)
FieldName nvarchar(64)
FieldValue nvarchar(128)

...something like that. It was actually a bit more 3NFy in that FieldName was in another table, associated with an artificial key, so that the field names weren't duplicated all over the place.

However, I'd like to extend this to numeric and drop-down data. I could just store numeric data as strings, but that seems like a pretty crappy idea. Same with drop downs.

I could stop using a table, and actually use columns on the main form table (the one that FormId above references), but that means adding a column for each new item as they come along, and older forms would just be null. (And, unless I stored it, I wouldn't know when that column was created. With the string table above, it's implicit.)

I could extend the table above to something like:

FormId int (FK)
FieldName nvarchar(64)
FieldValueType int -- enum as to which of the columns below are valid (or just let nulls imply that)
FieldValue nvarchar(128)
FieldValueInt int

Combos would have to be in a OTLT (one true lookup table), which I have reservations about, but perhaps it's needed here?

Any advice on StackOverflow? I'm using MSSQL, but this is really a more general question.

+2  A: 

Use Nulls. Proper database design is a complicated subject; you may do well to pick up a good reference and do some research on the whole thing (I gather this is a good book on the topic). In general, it sounds like you would be well served by starting with a single table that encapsulates all the fields in your form, and then putting it through the normalization process. And yes, use nulls and do NOT use an int to enumerate which columns are set to valid values; that is exactly what nulls are for.

McWafflestix
+2  A: 

You could have a separate table for each datatype.

I.e. to fetch an entire form you'd do an N-way join using the form id where N is the number of distinct datatypes you support (+ perhaps extras depending on the info you want - e.g. dropdown values would probably be stored in another table / your fieldname lookup / etc.)

But the design should probably also depend on how you intend to use the data, which you've said nothing about. And it would also depend on just how fast the rate of change is for these forms . . .

Jordan Stewart
+1  A: 

By creating a table with a description of your forms, you are actually defining a metadata structure. That's daunting. You would need a lot of the infrastructure needed for proper table description. I think the vendors of your database system spent a lot of effort in doing all that.

At first I thought - what a nice idea! Build your own compatibility-aware table description system!

But then I thought - I'm too stupid to do that on my own. There must be a database system capable of doing that.

So I conclude, not being a db expert, define proper defaults for 'new fields' in new form versions. Handle the compatibility issue in your business logic.

xtofl
+1  A: 

I would strongly advise against having a "generic table" like you describe.

You are essentially reinventing the relational database, which is not a good idea: Queries and updates will be very painful with your structure, and you will not be able to use the more advanced features like foreign keys and triggers, should you need them.

Just make a table(s) with columns for the data fields, and if a form does not have a field, let it be null.

Or, probably even better, have a "base table" (field that are in every form), and give names/version numbers to updated forms, and have a new table for the new columns that this version adds, then use a synthetic PK to join these new tables to your base table.

I.e.:

base table: id(numeric,PK), name, birthday, town

addresstable1: street, number, postal code, country, base_table_id (foreign key)

addresstable2: po box no, po box code, base_table_id (FK)

and so on.

That way you avoid loads of null fields; your tables are not so wide (always desirable), and your records are implicitly versioned, because the list of tables that have a record belonging to a record in your base table tells you which fields the original form had, hence what kind of form was used originally.

sleske