views:

558

answers:

6

Hello!

I am writing a addressbook module for my software right now. I have the database set up so far that it supports a very flexible address-book configuration.

I can create n-entries for every type I want. Type means here data like 'email', 'address', 'telephone' etc.

I have a table named 'contact_profiles'.

This only has two columns:

id           Primary key
date_created DATETIME

And then there is a table called contact_attributes. This one is a little more complex:

id       PK
#profile (Foreign key to contact_profiles.id)
type     VARCHAR describing the type of the entry (name, email, phone, fax, website, ...) I should probably change this to a SET later.
value    Text (containing the value for the attribute).

I can now link to these profiles, for example from my user's table. But from here I run into problems.

At the moment I would have to create a JOIN for each value that I want to retrieve. Is there a possibility to somehow create a View, that gives me a result with the type's as columns?

So right now I would get something like

#profile type    value
1        email   [email protected]
1        name    Sebastian Hoitz
1        website domain.tld

But it would be nice to get a result like this:

#profile email           name            website
1        [email protected] Sebastian Hoitz domain.tld

The reason I do not want to create the table layout like this initially is, that there might always be things to add and I want to be able to have multiple attributes of the same type.

So do you know if there is any possibility to convert this dynamically?

If you need a better description please let me know.

Thank you!

+1  A: 

If you are limiting yourself to displaying a single email, name, website, etc. for each person in this query, I'd use subqueries:

SELECT cp.ID profile
  ,cp.Name
  ,(SELECT value FROM contact_attributes WHERE type = 'email' and profile = cp.id) email
  ,(SELECT value FROM contact_attributes WHERE type = 'website' and profile = cp.id) website
  ,(SELECT value FROM contact_attributes WHERE type = 'phone' and profile = cp.id) phone
FROM contact_profiles cp

If you're using SQL Server, you could also look at PIVOT.

If you want to show multiple emails, phones, etc., then consider that each profile must have the same number of them or you'll have blanks.

I'd also factor out the type column. Create a table called contact_attribute_types which would hold "email", "website", etc. Then you'd store the contact_attribute_types.id integer value in the contact_attributes table.

Michael Haren
Good point to factor out the type. I forgot that one! Thanks :)
Sebastian Hoitz
A: 

You will need to generate a query like:

select #profile,
       max(case when type='email' then value end) as email,
       max(case when type='name' then value end) as name,
       max(case when type='website' then value end) as website
from mytable
group by #profile

However, that will only show one value for each type per #profile. Your DBMS may have a function you can use instead of MAX to concatenate all the values as a comma-separated string, or you may be able to write one.

This kind of data model is generally best avoided for the reasons you have already mentioned!

Tony Andrews
But is there any alternative to using this data model when you want to have this kind of flexibility in data you can enter?
Sebastian Hoitz
Tony's solution also assumes NULL sorts below any non-NULL value. This is not true in all SQL implementations.
Bill Karwin
So maybe use MIN instead of MAX?
Tony Andrews
+3  A: 

You have reinvented a database design called Entity-Attribute-Value. This design has a lot of weaknesses, including the weakness you've discovered: it's very hard to reproduce a query result in a conventional format, with one column per attribute.

Here's an example of what you must do:

SELECT c.id, c.date_created,
 c1.value AS name,
 c2.value AS email,
 c3.value AS phone,
 c4.value AS fax,
 c5.value AS website
FROM contact_profiles c
 LEFT OUTER JOIN contact_attributes c1
  ON (c.id = c1.profile AND c1.type = 'name')
 LEFT OUTER JOIN contact_attributes c1
  ON (c.id = c1.profile AND c1.type = 'email')
 LEFT OUTER JOIN contact_attributes c1
  ON (c.id = c1.profile AND c1.type = 'phone')
 LEFT OUTER JOIN contact_attributes c1
  ON (c.id = c1.profile AND c1.type = 'fax')
 LEFT OUTER JOIN contact_attributes c1
  ON (c.id = c1.profile AND c1.type = 'website');

You must add another LEFT OUTER JOIN for every attribute. You must know the attributes at the time you write the query. You must use LEFT OUTER JOIN and not INNER JOIN because there's no way to make an attribute mandatory (the equivalent of simply declaring a column NOT NULL).

It's far more efficient to retrieve the attributes as they are stored, and then write application code to loop through the result set, building an object or associative array with an entry for each attribute. You don't need to know all the attributes this way, and you don't have to execute an n-way join.

SELECT * FROM contact_profiles c
  LEFT OUTER JOIN contact_attributes ca ON (c.id = ca.profile);

You asked in a comment what to do if you need this level of flexibility, if not use the EAV design? SQL is not the correct solution if you truly need unlimited metadata flexibility. Here are some alternatives:

  • Store a TEXT BLOB, containing all the attributes structured in XML or YAML format.
  • Use a semantic data modeling solution like Sesame, in which any entity can have dynamic attributes.
  • Abandon databases and use flat files.

EAV and any of these alternative solutions is a lot of work. You should consider very carefully if you truly need this degree of flexibility in your data model, because it's hugely more simple if you can treat the metadata structure as relatively unchanging.

Bill Karwin
Thanks, that gives the problem a name! :)I would love to use a local associative array, but what if I have a list of entries where I would like to add the contact information? Should I create a temporary array for all list entries I'm going to show?
Sebastian Hoitz
If you need to update entries, you need to do them one at a time. Load from the database into array, change attributes, then save to the database. If you do this, you also need to track attribute deletions; you can't just unset an array element.
Bill Karwin
A: 

You create a view for each contact type

When you want all the information you pull from the entire table, when you want a subset of a specific contact type, you pull from the view.

I'd create a stored procedure that takes the intent {all, phone, email, address} as one of the parameters and then derive the data. All my app code would call this stored procedure to get the data. Also, when a new type is added (which should be very infrequently, you create another view and modify only this sproc).

I've implemented a similar design for multiple small/med size systems and have had no issues.

Am I missing something? This seems trivial?

EDIT:

I see what I was missing... You are trying to be normalized and denormalized at the same time. I'm not sure what the rest of your business rules are for pulling records. You could have profiles with multiple or null values for phone/email/addresses etc. I would keep your data format the same and again use a sproc to create the specific view you want. As your business needs change, you leave your data alone and just create another sproc to access it.

mson
A: 

There is no one right answer for this question, as one would need to know, for your specific organization or application, how many of those contact methods the business wants to collect, how current they want the information to be, and how much flexibility they are willing to invest in.

Of course, many of here could make some good guesses as to what the average business would want to do, but the real answer is to find out what your project, what your users, are interested in.

BTW, all architecture questions about "best"-ness require this sort of cost, benefit, and risk analysis.

Karen Lopez
A: 

Now that the approach of document-oriented databases is getting more and more popular, one could use one of them to store all this information in one entry - and therefor deleting all those extra joins and queries.

Sebastian Hoitz