views:

539

answers:

4

Looking for a scalable, flexible and fast database design for 'Build your own form' style website - e.g Wufoo.

Rules:

  1. User has only 1 Form they can build
  2. User can create their own fields or choose from 'standard' fields
  3. User's 1 Form has as many fields as the user wants
  4. Values can be the sibling of another value E.g A photo value could have name, location, width, height as sibling values

Special Rules:

  1. User can submit their form a maximum of 5 times a day
  2. Value's Date is important
  3. Flexibility to report on values (for single user, across all users, 1 field, many fields) is very important -- data visualization (most will be chronologically based e.g. all photos for July 2009 for all users).

Table "users"

uid

Table "field_user" - assign a field to a users form

fid

uid

weight - int - used to order the fields on the users form

Table "fields"

fid

creator_uid - int - the field 'creator'

label - varchar - e.g. Email

value_type - varchar - used to determine what field in the 'values' table will be filled in (e.g. if 'int' then values of this field will submit data into the values.type_int field - and all other .type_x fields will be NULL).

field_type - varchar - e.g. 'email' - used for special conditions e.g. validation rules

Table "values"

vid

parent_vid

fid

uid

date - date

date_group - int - value 1-5 (user may submit max of 5 forms per day)

type_varchar - varchar

type_text - text

type_int - int

type_float - float

type_bool - bool

type_date - date

type_timestamp - timestamp

I understand that this approach will mean records in the 'Value' table will only have 1 piece of data with other .type_x fields containing NULL's... but from my understanding this design will be the 'fastest' solution (less queries, less join tables)

+2  A: 

At OSCON yesterday, Josh Berkus gave a good tutorial on DB design, and he spent a good fraction of it mercilessly tearing into such "EAV"il tables; you should be able to find his slides on the OSCON site soon, and eventually the audio recording of his whole tutorial online (the latter will probably take a while).

You'll need a join per attribute (multiple instances of the values table, one per attribute you're fetching or updating) so I don't know what you mean by "less join tables". Joining many instances of the same table isn't a particularly fast operation, and your design makes indices nearly unfeasible and unusable.

At least as a minor improvement use per-type separate tables for your attributes' values (maybe some indexing might be applicable in that case, though with MySQL's limitation to one index per query per table even that's somewhat dubious).

Alex Martelli
Thank you for the links to OSCON and EAV. Did Josh suggest an alternative database-design for EAV'il database scenarios?
Chris Jacob
There's no really good way to achieve "totally flexible schema" in SQL, see http://www.pgexperts.com/presentations.html for links to slides of many of pgexperts' presentations including the OSCON 09 one (the "database that works" one should also have something); EBlob is essentially the alternative and of course it has its own tons of problems (different than EAVil's).
Alex Martelli
+2  A: 

You should really look into schema-free dbs like CouchDB, problems like this are exactly those these types of DBs want to solve.

THC4k
A good point... I could be trying to squeeze a square through a circle using MySQL - but it's my db of choice. CouchDB does look very interesting though, thanks for the tip ;-)
Chris Jacob
A: 

y'know, create table, alter, add a column, etc are operations you can do at run time in many modern rdbms implementations. Why be EAVil? Especially if you are using dynamic sql.

It's not for the fainthearted. I recall an implementation at Boeing which resulted in 70,000 tables in a database.

Obviously there are pitfalls in dynamic table creation, but they also exist for EAV tables. Things like two attributes for the same key expressing the same fact. Or transitive dependencies and other normalization gotchas. So why not at least leverage the power of the RDBMS on your behalf?