views:

140

answers:

7

I need to give users the ability to optionally add metadata to documents. Another way to state this is the fact that users need to add at least 5 categories to a document.

Basically what I want to do is dynamically add metadata (or categories) to a document on an ad hoc basis. Here are the options that I have thought of:

Option 1: Should I do this by dynamically creating new table columns in the database?

Option 2: Should I define 5 columns called attirbute1,attirbute2,attirbute3,attirbute4,attirbute5 and then only use and show them if the user requires the attributes.

Option 3: Should I create a metadata table that keeps track of the columns and the data associated with them?

What do you think is the best way to achieve this? Can you think of any other ways to easily add this functionality. The problem is that the functionality needs to be very generic.

+5  A: 

Each document has a unique DocumentID.

Just add another table that has three columns:

DocumentID
MetaName
MetaData

Then they can add as many pieces of metadata to a given document. If all their documents use the same metanames then it's trivial to search the metadata.

Adam Davis
This is the route I would use. We use it here at work quite a bit.
Joe Philllips
How would I go about implementing the view for the tables\columns associated with a document? Could someone please give an example?Thanks
David Smit
I asked here http://stackoverflow.com/questions/219559/what-sql-query-or-view-will-show-dynamic-columns and there were a few suggestions. When I did it before I had to use two queries, the second was formed fromt eh results of the first (and there are php/mysql examples on the net of this)
Adam Davis
A: 

All of these options are feasible, and there is not one correct answer. You should weigh your various options and choose the best solution for your situation.

Option 1: Feasible but could really get out of hand as your user count increases. May also have an impact on storage costs.

Option 2: Probably the fastest solution to implement but the least robust solution and a higher maintenance cost. If you need to go to 6 columns, you will have to add another column, etc..

Option 3: Probably the most robust solution is to have a metadata table that captures this information and then build your tables\columns dynamically based on this metadata. This solution will also probably take the longest and cost the most $$.

SaaS Developer
+1  A: 

This sounds a lot like tagging. You can probably modify acts-as-taggable to do what you need.

John Topley
A: 

I'd certainly go with option 3: have a table called DocumentCategories, which stores the category each document belongs to. Not only is it "more relational", it also will help if your requirements ever change: what if you decide you want 6 categories tomorrow?

Plus, it gives you more options for querying: what if you want to see how many times each category was used, or to select documents by category? With option 3, that's just a join, and it's both fast and easy to write. Options 1 and 2 make doing simple things like that very complicated.

Jacob B
A: 

I would go with option 3.

Dynamically changing the data structure will become very difficult to maintain, and could introduce some interesting bugs.

Having a number of columns that may or may not be needed will still add complexity becuase you will need to check if each column is used. Plus, you will still be restrained to 5 columns.

Option 3, though is very flexible, and allows for growth. Just have a foreign key referring to the document, a column for the name, and a column for the value.

pkaeding
A: 

I would go with three tables.

Document (The document)

Category (The defined categories)

DocumentCategory (A link table connecting documents to categories)

The only downside is that your dbms might not support a good way to constrain this design to require atleast five entries in DocumentCategories per Document, but you could enforce this at the application level.

John Nilsson
+1  A: 

One thing you don't make clear is if the number and/or name of attributes is changeable or if it's the same for all documents. This would alter my recommendation somewhat.

Presuming you have an object handle that uniquely describes a given document, I propose a table that manages the meta-data. If you have an unknown number of attributes and unknown names of those attributes, I recommend something like this:

create table DocMetaData
(
   DocumentHandle        varchar NOT NULL,
   MetaDataName          varchar NOT NULL,
   MetaDataText          varchar NOT NULL
);

You then insert into this table when you have meta-data using the name that's most appropriate. If there's no row, there's no meta-data. If there is meta-data, you clearly have a name for that meta-data, and the data itself. You can include nullability on the metadata itself if you need to, though I'd probably just make it an empty text (something like, default ''), rather than nulls because you get odd behaviors (don't get your row!) if you select a column and it's not there and you didn't explicitly ask for the null value rows. Remember, this design doesn't spell out unique so you've got optionality, only store when you have data...

Of course, if all the possible meta-data attributes are known, you could just spell them out!

Dynamic table creation is a ROYAL pain - I would not do it here.

Richard T