views:

56

answers:

5

15 ECTS credits worth of database design down the bin.. I really can't come up with the best design solution for my problem.

Which is this: Basically I'm making a tool that gathers a lot of information concerning the user. At the most the user would fill in 50 fields of data, ranging from simple checkboxes to text input. I'm designing the db right now (with mySql) and can't decide whether or not to use a single User table with all of those fields, or to have a table for each category of input.

One example would be "type of payment". This one has three options and if I went with the "table" way I would add a table paymentType and give it binary fields for each payment type. Then I would need and id table to identify which paymentType the user has chosen whereas if I use a single user table, the data would already be there.

The site will probably see a lot of users (tv, internet and radio marketing) so I'm concerned which alternative would be the best.

I'll be happy to provide more details if you need more to base a decision.

Thanks for reading.

+2  A: 

Read this article "Database Normalization Basics", and come back here if you still have questions. It should help a lot.

The most fundamental idea behind these decisions, as you will see in this article, is that each table should represent one and only one "thing", and each field should relate directly and only to that thing.

In your payment types example, it probably makes sense to break it out into a separate table if you anticipate the need to store additional information about each payment type.

JohnFx
Thanks, I've read through that and will probably do my best to normalize to 3NF.
Phil
+1  A: 

If you are building a form with variable inputs, I wouldn't recommend building it as one table. This is inflexible and dirty.

Normalization is the key, though if you end up with a key/value setup, or effectively a scalar type implementation across many tables and can't cache:

a) the form definition from table data and
b) the joined result of storage (either a caching view or otherwise)
c) or don't build in proper sharding

Then you may hit a performance boundary.

In this KVP setup, you might want to look at something like CouchDB or a less table-driven storage format.

You may also want to look at trickier setups such as serialized object storage and cache-tables if your internal data is heavily relative to other data already in the database

Aiden Bell
A: 

50 columns is a lot. Have you considered a table that stores values like a property sheet? This would only be useful if you didn't need to regularly query the values it contains.

INSERT INTO UserProperty(UserID, Name, Value)
     VALUES(1, 'PaymentType', 'Visa')

INSERT INTO UserProperty(UserID, Name, Value)
     VALUES(1, 'TrafficSource', 'TV')
BC
What type is Value? varchar()? Blob? ... Unless done carefully this can get grim quite quickly when lots of data is chucked in.
Aiden Bell
It doesn't matter what the type is, but strings seems like a good choice. An implementation like this would have to be fault tolerant about what the table contained. But with 50 fields, i imagine that not all of this data needs to be available in tabular form.
BC
Good idea.. although wouldn't this allocate an awful lot of space if I want to make sure Value can be both 2 and 255 characters?
Phil
No, a varchar field is only as large as it needs to be, up to the maximum.
BC
varchars are inflexible and not a replacement for multiple data storage, nor should be treated like a scalar; this negates the benefits of SQL native data types and means you may as well use something like CouchDB or a persistent object store
Aiden Bell
+1  A: 

Create your "Type of Payment" table; there's no real question there. That's proper normalization and the power behind using relational databases. One of the many reasons to do so is the ability to update a Type of Payment record and not have to touch the related data in your users table. Your join between the two tables will allow your app to see the updated type of payment info by changing it in just the 1 place.

Regarding your other fields, they may not be as clear cut. The question to ask yourself about each field is "does this field relate only to a user or does it have meaning and possible use in its own right?". If you can never imagine a field having meaning outside of the context of a user you're safe leaving it as a field on the user table, otherwise do the primary key-foreign key relationship and put the information in its own table.

Tahbaza
Thanks for your input. It's difficult to see the meaning of the fields right now and therefore I think the safer way would be to create tables where I'm in doubt.
Phil
A: 

I think I figured out a great way of solving this. Thanks to a friend of mine for suggesting this!

I have three tables, Field {IdField, FieldName, FieldType}, FieldInput {IdInput, IdField, IdUser} and User { IdUser, UserName... etc }

This way it becomes very easy to see what a user has answered, the solution is somewhat scalable and it provides a good overview. I will constrain the alternatives in another layer, farther away from the db. I believe it's a tradeoff worth doing.

Any suggestions or critics to this solution?

Phil