views:

94

answers:

3

I am looking at a problem which would involve users uploading lists of records with various field structures into an application. The 2nd part of this would be to also allow the users to specify fields to capture information.

This is a step beyond anything ive done up to this point where i would have designed a static RDMS structure myself. In some respects all records will be treated the same so there will be some common fields required for each. Almost all queries will be run on these common fields.

My first thought would be to dynamically generate a new table for each import and another for each data capture field spec.Then have a master table with a guid for every record in the application along with the common fields and then fields that specify the name of the table the data was imported to and name of table with the data capture fields.

Further information (metadata?) about the fields in the dynamically generated tables could be stored in xml or in a 'property' table.

This would mean as users log into the application i would be dynamically choosing which table of data to presented to the user, and there would be a large number of tables in the database if it was say not only multiuser but then multitennant.

My question is are there other methods to solving this kind of varaible field issue, im i going down an unadvised path here?

I believe that EAV would require me to have a table defining the fields for each import / data capture spec and then another table with the import - field - values data and that seems impracticle.

A: 

What kind is each field? Could the type of field be different for each record?

I am working on a program now that does this sorta and the way we handle it is basically a record table which points to a recordfield table. the recordfield table contains all of the fields along with the field name of the actual field in the database(the column name). We then have a recorddata table which is where all the data goes for each record. We also store a record_id telling it which record it is holding.

This is how we do it where if each column for the record is the same type, then we don't need to add new columns to the table, and if it has more fields or fields of a different type, then we add fields as appropriate to the data table.

I think this is what you are talking about.. correct me if I'm wrong.

Earlz
yes, I add a column to the recorddata table dynamically. So far its working well for us, we have a number of different field types defined by one character, so before adding a field that table would look like `record_id,field_1_t` (t for text) and after a add it would look like `record_id,field_1_t,field_2_t` or `field_1_t,field_1_i` (i for integer).
Earlz
Note that the `record_field` table is what holds the meta data for the `record_data` table.
Earlz
We actually have different "record" types and for each type we have it put in a table so that there is a hope of not ever reaching more than 50-80 columns per table... It is still in development so I can't say what kind of usage it will support at the moment as none of us are quite sure right now. (though it seems promising that it should be pretty speedy)
Earlz
you answered my comment before i posted it. Is there any performance detriment to having lots of fields in a table if you only return those you need? I don't know if this would work in my case as i wouldnt be able to control field number growth though.
g_g
@your first comment: yes. @second: I believe most database systems do have a hard limit on the amount of columns. It depends on how many fields for each record... If you are just going to have like 10-30 then usually there will be enough overlap that your table won't grow more to more than 50 or so columns.. but if each record will have more than 50 or 80 or some high number, I wouldn't suggest using this method..
Earlz
A: 

I hate storing XML in the database, but this is a perfect example of when it makes sense. Store the user imports in XML initially. As your data schema matures, you can later decide which tables to persist for your larger clients. When the users pick which fields they want to query, that's when you come back and build a solid schema.

Brent Ozar
A: 

I think that one additional table for each type of user defined field for the table that the user can add the fields to is a good way to go.

Say you load your records into user_records(id), that table would have an id column which is a foreign key in the user defined fields tables. user defined string fields would go in user_records_string(id, name), where id is a foreign key to user_records(id), and name is a string, or a foreign key to a list of user defined string fields.

Searching on them requires joining them in to the base table, probably with a sub-select to filter down to one field based on the user meta-data, so that the right field can be added to the query.

To simulate the user creating multiple tables, you can have a foreign key in the user_records table that points at a table list, and filter on that when querying for a single table.

This would allow your schema to be static while allowing the user to arbitrarily add fields and tables.

Sarah Happy
Am i right in thinking `user_records` table has the common fields that all imports will have, `user_records_string` would have say 5 records for each user_records record if there where 5 user defined fields of type string
g_g
that is how I was thinking, yes
Sarah Happy