views:

85

answers:

4

Can I have an "identity" (unique, non-repeating) column span multiple tables? For example, let's say I have two tables: Books and Authors.

Authors
  AuthorID
  AuthorName
Books
  BookID
  BookTitle

The BookID column and the AuthorID column are identity columns. I want the identity part to span both columns. So, if there is an AuthorID with a value of 123, then there cannot be a BookID with a value of 123. And vice versa.

I hope that makes sense.

Is this possible?

Thanks.

Why do I want to do this? I am writing an APS.NET MVC app. I am creating a comment section. Authors can have comments. Books can have comments. I want to be able to pass an entity ID (a book ID or an author ID) to an action and have the action pull up all the corresponding comments. The action won't care if it's a book or an author or whatever. Sound reasonable?

A: 

As a suggestion - try to use table like ComentId, EntityId, isBook, Comment for comments. isBook is boolean type and not much place to get. Your concept is not good from relational point of view.

Artic
+3  A: 

Even if you could put the identity sequence across multiple tables, your comment table is not going to be able to reference both columns in a single foreign key.

The best way to do this, in terms of relational database design theory, would be to create two comment tables. But obviously, you want to avoid that, probably for code-reuse reasons.

The most straightforward pragmatic approach would be to put two foreign key columns on the comment table, and just make one null and the other not null for each comment.

Another approach, which might be the best compromise, is this. You refer in your question to an "entity ID". So make an Entity table! Then the authors and books and comments can all refer to that table.

Edited to add:

Philip Kelley, Ray, and (I think) Artic have all suggested modifying the comment table by adding an entity_id, which can refer to either the book_id or the author_id, and a flag of some sort (char(1), tinyint, and boolean, respectively) that indicates which of these is being referred to.

This is not a good solution for many reasons, both pragmatic (including data integrity, reporting, efficiency) and theoretical.

The first and most obvious problem is the data integrity problem. A relational database system should always be responsible for maintaining the integrity of its own data, and there are natural and preferred ways that the DB is designed to do this. One of the most important of these mechanisms is the foreign key system. If the comment.entity_id column is to reference both book.book_id and author.author_id, then a foreign key cannot be created for this column.

Sure, you could put a check in your DML (insert, update, delete) stored procedures to verify the references, but that would quickly turn into a big mess, as all DML operations on all three tables would be involved.

And that leads us to the efficiency problem. Whenever a query is run against the comment table, it will require joins to either the author or book table or both. The query plan generation system will not have foreign keys available to optimize with, so its performance could very well be degraded.

Then there are problems with this scheme in reporting. Any report generating system is going to have trouble with this sort of system. Sure this won't be a problem for expert programmers, but any user ad-hoc reports are going to have to mock up the logic behind when the event_id means this or that, and it could be a pretty bad deal. Maybe you won't ever use report generating tools on this database. But then again, nobody knows where a database is going to be ultimately used. Why not work with the system to allow for anything?

And that leads us to the theoretical problems.

In relational database theory, each row (a.k.a. "tuple") in each table ("relation variable") represents a proposition about the real world. Designing a table is to decide the form of that proposition. Let's look at a few examples of how this might work.

comment (comment_id int, comment_type char(1), entity_id int, 
         user_id int, comment_text nvarchar(max), comment_date datetime)
/* comment_id identifies a comment (comment_text) that a user (user_id) 
   has made about a book (entity_id if comment_type = 'B') or author 
   (entity_id if comment_type = 'A') at a particular date and 
   time (comment_date).*/

Here it is clear that the column (or "attribute") called entity_id is doing double-duty. It doesn't really represent anything, except with reference to another column. This is workable, but unsatisfactory.

comment (comment_id int, book_id int, author_id int, user_id int, 
         comment_text nvarchar(max), comment_date datetime)
/* comment_id identifies a comment (comment_text) that a user (user_id) 
   has made about a book (book_id if not null) or author (author_id if 
   not null) at a particular date and time (comment_date). */

This buys us the foreign keys that are the biggest omission from the first version. But this still isn't terribly satisfactory, unless a single comment can refer to both a book and an author (which might be reasonable). Nullable columns are a warning sign that something is wrong with the design, and that may be the case here as well. A check constraint may be necessary to avoid a comment that refers to nothing at all, or to both a book and an author if that is not to be allowed.

From a theoretical perspective (and thus, my perspective :)) there is a clear best option:

book_comment (book_comment_id int, book_id int, user_id int, 
              comment_text nvarchar(max), comment_date datetime)
/* book_comment_id identifies a comment (comment_text) that a 
   user (user_id) has made about a book (book_id) at a particular 
   date and time (comment_date). */

author_comment (author_comment_id int, author_id int, user_id int, 
                comment_text nvarchar(max), comment_date datetime)
/* author_comment_id identifies a comment (comment_text) that a 
   user (user_id) has made about an author (author_id) at a particular 
   date and time (comment_date). */

This last option would provide the best efficiency, data integrity, and ease of reporting. And the only expense would be that the DML stored procedures would need to put the comments into the right tables, which is not a big deal, since they had to know what the comments were referring to anyway.

If your plan was to retrive all comments for a book or author at once, then you can easily create a view on top of these tables that reproduces the other designs, if that's what you want to do.

create view comments as 
select 
    book_comment_id as comment_id, 
    book_id as entity_id, 
    comment_text,
    'B' as comment_type
from book_comment
union
select 
    author_comment_id as comment_id, 
    author_id as entity_id, 
    comment_text,
     'A' as comment_type 
from author_comment
Jeffrey L Whitledge
Hi Jeffrey,Why do you think it is a bad idea to use a "commenter type ID" along with the "Entity ID"?Why are your two latter suggestions better?I'm still learning!Thanks.
johnnycakes
OK. I addressed your question in an edit to the answer.
Jeffrey L Whitledge
This, IMO, is the correct answer. Using an EAV structure for this type of solution is the wrong answer and will turn ugly in reporting. Adding another table really does not cost much but provides many benefits including the ability for author comments to later have attributes that book comments do not.
Thomas
A discriminator column with multiple references is not EAV as @Thomas suggests, it's more like multiple-table inheritance (except it's not being used for inheritance in this particular question, so it's... weird). Nevertheless, I would say that given the question specifications, this is the best answer. If app performance is an issue (i.e. you need to be able to traverse the relationship from a `comment` to a `book` or `author` without joining to all the mapping tables), this could still be *denormalized* with a discriminator column in the `comment` table without sacrificing RI.
Aaronaught
Thanks for the explanation. What about the end of your original post where you suggest "You refer in your question to an "entity ID". So make an Entity table! Then the authors and books and comments can all refer to that table." Do you mean have a table where the "discriminator" column would exist. i.e. It would just have two columns: EntityID, EntityType. Then my BookID (in the Books table) would be the same as my EntityID, and I can just reference the EntityID in my Comments table. (cont. below)
johnnycakes
I understand the benefits of having different comments tables for different entities, but I'm trying to make this scalable for new entities. For example, later I may want to add "Book Stores" to my schema. I'd like to be able to just "turn on" comments for the new "Book Store" entity instead of having to create a new comments table. And "Comments" is just one example. I could do the same for Ratings, Reviews, Tags, etc. I'm pretty new to this type of programming, so I'm sorry if any of my explanations are unclear. I appreciate your efforts here. Thanks.
johnnycakes
The entity table option would be just as you describe, except that I wouldn't necessarily put the descriminator column in that either, since it would be redundant with the existance of a row in another table. It might be a good idea in this case, though, if there are lots of types of entities to search for. If you expand the system as you say with more entity types and more things referencing the entities, then "an entity" becomes more meaningful as a concept, and a table to hold such things becomes worthwhile. It sounds like that is the design you should persue.
Jeffrey L Whitledge
So basically I just need a table with one (identity) column that generates the ID value that each Entity will use? Seems to make sense to me; I just don't think I've ever had a one column table before.
johnnycakes
+2  A: 

The short answer is: No, you can't do that (at least in MS SQL Server through 2008).

You could make a new table, "CommentableEntity", plug your identity column in there, then define foreign keys in Authors and Books to reference it as a parent table, and then do one of a number of tricks to ensure that a given ID value is not assigned to both tables... but this is a poor idea, because the data model you built would imply that Authors and Books are related kinds of data, and they really aren't.

You could have a separate table, Comments, have the identity column in there, and park a CommentId column in both Authors and Books. However, that would limit each book and author to only one comment.

Me, I'd probably add a column like "CommentorType" to the Comments table and a put a flag in there indicating source of comment ("A" for author, "B" for book). Build a primary key on "CommentorId + CommentorType", and it should work well enough -- and it'd be trivial to add further types of commentors as the system expands.

Philip Kelley
You can't have a single foreign key point to two different tables, even with a flag handy.
Jeffrey L Whitledge
Your suggestion ("I'd probably add a column like "CommentorType" to the Comments") is the path I was headed down before I decided to post and make sure there wasn't an easier way to do it.Thanks.
johnnycakes
But adding commentor type to the comments is not a good solution! Don't do this! You will regret it! OK, I feel better now. Moving on.
Jeffrey L Whitledge
A: 

SQL server does not support this. You could roll your own with an id table, but that would be more work than it is worth.

I suggest your comment table look like this:

comment_id int identity
comment_type tinyint
entity_id int

comment_type specifies if the comment belongs to a book, an author, or something else you add in the future. entity_id is the id of the book, author, whatever. In this scheme, it doesn't matter if book or author ids overlap.

Or, if you can switch to oracle, use a sequence :)

Ray