views:

224

answers:

5

Hi guys, im in the beginning of the complete restyle of an my web application, and i have some doubt about a good database-design that can be reliable, query-performance, and in the same time fully customizable by the users (users wont customize the database structure, but the funcionality of the application).

So, my actual situation is, for example, a simple user's table:

id | name | surname | nickname | email       | phone
1  | foo  | bar     | foobar   | [email protected] | 99999

Thats it.

But, lets say that one of my customer would like to have 2 email addresses, or phone numbers, for one specific user.

Untill now, i used to solve that problem simply adding columns in the users table:

id | name | surname | nickname | email       | phone | email_two    | phone_two
1  | foo  | bar     | foobar   | [email protected] | 99999 | [email protected]  | 999998

But i cant use that way with the new application's version.. i'll like to be drinking mojito after that, dont like costumer's call to edit the structure :)

So, i thought a solution where people can define customs field, simply with another table:

id | table_refer | type_field | id_object | value
1  | users       | phone      | 1         | 999998
2  | users       | email      | 1         | [email protected]

keeping the users table unaltered.

But this way have 2 problems:

  1. For what i know, there is no possibility to use foreigns key in that way, that if i delete 1 user automatically the foreign key delete in cascade all the row in the second table that have the 'table_refer' value=users and the id_object=users.id. Sure, i'll can use some triggers function, but i'll lose some of the reliability.
  2. When i'll need to query the database, fore retrieve the users that match '[email protected]', i'll have to check all the... hem.. option_table as well, and that will make my code complex and less-reliable and messy with many joins.. assuming that the users table wont be the only one 'extended' by the 'option_table', seem to be a gray view.

My goal is to let my customers adding as many custom fields as they need, for almost all the object in the application (users, items, invoices, print views, photos, news, etc...), assuming that most of those table would be partitioned (splitted in 2 table, with a 3 table and inheritance gerarchy).

You think my way can be good, do you know some other better, or am i in a big mistake? Please, every suggest is gold now!

EDIT:

What i'm lookin for could be simplifyed ith the 'articles-custom-fields' in wordpress blogs. My goal is to let the user to define new fields that he needs, for example, if my users table is the one above, and a customer need a field that i havent prevent, like the web-site url, he must be able to add it dinamically, without edit the database structure, but just the data.

I think that the 2° table (maibe 1 for each object) can be a good solution, but i am still waiting for better ways!

A: 

if you do it this way all of your queries will have to join to and use table_refer column, which will kill performance, and make simple queries hard, and hard queries very difficult.

if your want multiple e-mails, split the email out to another table so you can have many rows.

KM
And how does 'split the email out to another table' be different from my solution? If you meaning to have 1 table for emails, 1 for phones, etc.. i must stop you: i dont want to have 1000 sub-tables for every item ;)
DaNieL
there has to be a reasonable limit on what you will allow multiple values for. Will you need multiple rows and new tables for name, and/or surname, and or nickname? I don't think so. Possibly make a Contact table, with a type column: "e"mail or "p"hone and a multipurpose Value field
KM
Surely, for not all the fields there ill be the 'duplicate' option.. man dont focus on the email-phone example, it is just an example. For the users table, the field 'multiple' would be not more than 5 i guess... but what happen if an user need a field that i havent put?
DaNieL
Just edited the question.
DaNieL
the app has to know what is permitted to have multiples, if that is the case just split out those tables from the git-go, optimize your queries for those tables and forget about making an all powerfull stores anything table that will zap performance and make writing your queries hard
KM
+4  A: 

As I said in my Answer to a similar question, "Database Design is Hard." You are going to have to make the decision about which is better for you, normalizing the tables and bringing phone numbers and e-mail addresses into their own tables, with the associated JOIN-ing to retrieve the data, and the extra effort of referential integrity, or having some number n e-mail and phone fields in your table, and the "data-messiness" that that entails.

Database design is always a series of tradeoffs. You need to look at all angles, maybe bodge up some prototypes and do some profiling, etc. There is no "One True Answer™".

Adrien
+1: Database design is hard. If you want to give them unlimited customization, you have to give them the code.
S.Lott
Seems to be true. Only good advises can be here... but not answers.
Jet
I agree with you that there no 'one true answer', but maybe there will be a best answer... and that is why i am here: looking for other ideas about that kind of design
DaNieL
@S.Lott: i cant give them the code, the application will be used by users that are able just to surf the web, not to costum theyre database/code.
DaNieL
A: 

You could design your application to request additional data (like emails list for the user) on demand, using AJAX etc. In those highly customizable and rich applications usually you have no need to display all the data - only a single category.

To store custom records you can create table field_types(id, name, datatype) and a table custom_fields(user_id, field_type_id, value), and then select smth like this:

SELECT * FROM custom_fields WHERE user_id=XXX AND field_type_id IN (X,Y,Z).

so now you can retrieve data in 1 fast query, split fields to categories and parse their values by their respective datatypes with your code without performance issues.

Jet
So, 1 table with the custom fields for evrery object? This could be.. would be easyer even the partitioning, i guess.
DaNieL
It depends on data. But in most cases table with custom fields should be better.
Jet
A: 

I'm not sure about the specifics of postgresql, but if you want highly customisable data structures in a DB that you don't really want to search on, the serializing the data to a LOB is an option.

In fact this is the way ASP.NET works by default with Personalization, which is per user settings.

I don't recommend this approach if you wish to search the fields for any reason.

RichardOD
A: 

Your proposed model is composed of two database patterns: an entity-attribute-value table and a polymorphic association.

Entity-attribute-value has some pretty big issues both in the performance and data integrity department. If you don't need to access the additional attributes in queries, then you can serialize the attribute value mapping to a text field in some standard serialization (JSON, XML). Not "pure" from the database design standpoint, but possibly a good pragmatic choice, given that you are aware of the tradeoffs. On postgres you can also use the hstore contrib module to store key-value pairs to make it usable in queries, if the limitation of string only values is acceptable.

For polymorphic association, you can get referential integrity by introducing an association table:

users                attrib_assocs       custom_attribs
-----                -------------       --------------
attrib_assoc_id -->  id             <--  assoc_id
...                  entity_type         field
                                         value

To get slightly more integrity, also add the entity_type to the primary key and corresponding foreign keys and a check constraint on users table that the entity_type equals 'user'.

Ants Aasma
WTF, a text field with the values in a json array? No mate, thanks for the answer but as i specified before i'll need to query the additional fields as well as the normal fields...
DaNieL
If the ability to query truly schemafree data is a requirement, then EAV is the pattern to go for. Depending on the performance requirements you still may need to denormalize the data, e.g. use a trigger to keep an aggregated form of the data on the owner row. My experience with a few hundred million row EAV table is that there's almost two orders of magnitude performance difference between joining the data and having it available on the owner row. With Postgres you can create GIN indexes over the aggregated form to get similar speedups for querying.
Ants Aasma