views:

661

answers:

4

So, I've been reading up on identifying vs. non-identifying relationships in my database design, and a number of the answers on SO seem contradicting to me. Here are the two questions I am looking at:

  1. What's the Difference Between Identifying and Non-Identifying Relationships
  2. Trouble Deciding on Identifying or Non-Identifying Relationship

Looking at the top answers from each question, I appear to get two different ideas of what an identifying relationship is.

The first question's response says that an identifying relationship "describes a situation in which the existence of a row in the child table depends on a row in the parent table." An example of this that is given is, "An author can write many books (1-to-n relationship), but a book cannot exist without an author." That makes sense to me.

However, when I read the response to question two, I get confused as it says, "if a child identifies its parent, it is an identifying relationship." The answer then goes on to give examples such as SSN (is identifying of a Person), but an address is not (because many people can live at an address). To me, this sounds more like a case of the decision between primary key and non-primary key.

My own gut feeling (and additional research on other sites) points to the first question and its response being correct. However, I wanted to verify before I continued forward as I don't want to learn something wrong as I am working to understand database design. Thanks in advance.

+1  A: 

Yes, go with first one, but i don't think second one contradicts the first one. It's just formulated a little bit confusing..

UPDATE:

Just checked - second question's answer is wrong in some assumptions,.. book-author is not necessarily 1:n relation, as it could be m:n. In relational databases that creates intersection table for this m:n relation, and you get identifying relations between intersection table and those other 2 tables..

praksant
+3  A: 

The technical definition of an identifying relationship is that a child's foreign key is part of its primary key.

CREATE TABLE AuthoredBook (
  author_id INT NOT NULL,
  book_id INT NOT NULL,
  PRIMARY KEY (author_id, book_id),
  FOREIGN KEY (author_id) REFERENCES Authors(author_id),
  FOREIGN KEY (book_id) REFERENCES Books(book_id)
);

See? book_id is a foreign key, but it's also one of the columns in the primary key. So this table has an identifying relationship with the referenced table Books. Likewise it has an identifying relationship with Authors.

A comment on a YouTube video has an identifying relationship with the respective video. The video_id should be part of the primary key of the Comments table.

CREATE TABLE Comments (
  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  PRIMARY KEY (video_id, user_id, comment_dt),
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

It may be hard to understand this because it's such common practice these days to use only a serial surrogate key instead of a compound primary key:

CREATE TABLE Comments (
  comment_id SERIAL PRIMARY KEY,
  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

This can obscure cases where the tables have an identifying relationship.

I would not consider SSN to represent an identifying relationship. Some people exist but do not have an SSN. Other people may file to get a new SSN. So the SSN is really just an attribute, not part of the person's primary key.

Bill Karwin
+1  A: 

Identifying / non-identifying relationships are concepts in ER modelling - a relationship being an identifying one if it is represented by a foreign key that is part of the referencing table's primary key. This is usually of very little importance in relational modelling terms because primary keys in the relational model and in SQL databases do not have any special significance or function as they do in an ER model.

For example, suppose your table enforces two candidate keys, A and B. Suppose A is also a foreign key in that table. The relationship thus represented is deemed to be "identifying" if A is designated to be the "primary" key, but it is non-identifying if B is the primary key. Yet the form, function and meaning of the table is identical in each case! This is why in my opinion I don't think the identifying / non-identifying concept is really very important.

dportas
@David - +1 - Thanks for clearing this up! I (and another co-worker, also not familiar with database design) were struggling with this as we were not seeing why one or the other mattered as it achieved the same effect. This really helps.
JasCav
+2  A: 

"as I don't want to learn something wrong".

Welll, if you really mean that, then you can stop worrying about ER lingo and terminology. It is imprecise, confused, confusing, not at all generally agreed-upon, and for the most part irrelevant.

ER is a bunch of rectangles and straight lines drawn on a piece of paper. ER is deliberately intended to be a means for informal modeling. As such, it is a valuable first step in database design, but it is also just that : a first step.

Never shall an ER diagram get anywhere near the preciseness, accuracy and completeness of a database design formally written out in D.

Erwin Smout
So, if I read your response right, ER modeling is just a tool to help conceptualize the database (similar to how UML modeling is a tool used to conceptualize software systems). While each tool is helpful, that does come with caveats that they have their own syntax and problems that can add further confusion. I hadn't thought of this aspect. Thanks.
JasCav