Let's say you're collecting insider info on upcoming superhero movie releases and your main Movie table looks something like this:
Table 1
Title Director Leading Male Leading Female Villain
--------------------------------------------------------------------------
Green Lantern Kubrick Robert Redford Miley Cyrus Hugh Grant
The Tick Mel Gibson Kevin Sorbo Linda Hunt Anthony Hopkins
This should work very well in general and allow very easy queries as well as comparisons between rows.
However, you'd like to track the source of each data fact, as well as the name of the journalist who discovered the fact. This seems to suggest some sort of an EAV table like this:
Table 2
Movie Attribute Value Source Journalist
----------------------------------------------------------------------------------
Green Lantern Director Kubrick CHUD Sarah
Green Lantern Leading Male Robert Redford CHUD James
Green Lantern Leading Female Miley Cyrus Dark Horizons James
Green Lantern Villain Hugh Grant CHUD Sarah
The Tick Director Mel Gibson Yahoo Cameron
...
Which, while it easily captures the meta-data that we wanted, makes queries harder. It takes a bit more to simply get all the basic data of a single movie. More specifically, you have to deal with four rows here to get the four important tidbits of information on the Green Lantern while in table 1 it is a single, nicely encapsulated row.
So my question is, in light of the complications I just described, and because I know in general EAV tables are to be avoided, is the EAV still the best solution? It does seems like it is the only reasonable way to represent this data. The only other alternative I see is to use table 1 in conjunction with another one that only houses meta data like this:
Table 3
Movie Attribute Source Journalist
----------------------------------------------------------------------------------
Green Lantern Director CHUD Sarah
Green Lantern Leading Male CHUD James
Green Lantern Leading Female Dark Horizons James
Green Lantern Villain CHUD Sarah
The Tick Director Yahoo Cameron
...
But this is very dangerous because if someone changes a column name in table 1, like "Villain" to "Primary Villain," the row in table 3 will still simply say "Villain" and thus the related data will be unfortunately decoupled. This could be helped if the "attribute" column was linked to another table that served as an enumeration of the columns of table 1. Of course, the DBA would be responsible for maintaining this enumeration table to match the actual columns of table 1. And it might actually be possible to improve this even further by instead of creating the enumeration table by hand, use a system view in SQL Server that houses the names of the columns in table 1. Though I'm not sure you can have relationships that involve system views.
What do you suggest? Is the EAV the only way to go?
And what if it was only one meta-data column (just "Source" without "Journalist") - is it still necessary to go the EAV route? You could have columns "Director," "Director_Source," "Leading Male," "Leading Male_Source," etc., but that gets ugly very quickly. Is there some better solution I'm not thinking of?
If I haven't clarified any point please comment and I'll add more as necessary. Oh yeah, and the movie data I used is fabricated :)
Edit: To restate my primary question concisely, I would like to have the simplicity and the true RDBMS design of table 1, which really describes a movie entry well, while still storing the meta data on the attributes in a safe and accessible manner. Is this possible? Or is EAV the only way?
Edit 2: After doing some more web research, I have yet to find a discussion on EAV's that centered around the desire to store metadata on the columns. The primary reason given to implement an EAV is almost always dynamic and unpredictable columns, which is not the case in my example. In my example, There are always the same four columns: director, leading male, leading female, villain. However, I want to store certain facts (source and journalist) about each column for each row. An EAV would facilitate this, but I would like to avoid resorting to that.
Update
Using the Table 2 design except for renaming the column "Movie" to "Name" and calling the whole table "Movie," here is the pivot operation in SQL Server 2008 to get back Table 1:
SELECT Name, [Director], [Leading Male], [Leading Female], [Villain]
FROM (Select Name, Attribute, Value FROM Movie) as src
PIVOT
(
Max(Value)
FOR Attribute IN ([Director], [Leading Male], [Leading Female], [Villain])
) AS PivotTable