views:

184

answers:

5

Just as an update for Cliffs, Thanks ChaosPandion for the template.

Person
PersonID Int PK

Network
PersonID Int PK FK
OtherPersonID Int PK FK

OR

Person
PersonID Int PK

Network
PersonID Int PK FK
FriendID Int PK FK

Friend
FriendID Int PK
OtherPersonID Int FK

++++++ Original Post Below ++++++

Hi All,

I'm a web developer and have recently started a project with a company. Currently, I'm working with their DBA on getting the schema laid out for the site, and we've come to a disagreement regarding the design on a couple tables, and I'd like some opinions on the matter.

Basically, we are working on a site that will implement a "friends" network. All users of the site will be contained in a table tblUsers with (PersonID int identity PK, etc).

What I am wanting to do is to create a second table, tblNetwork, that will hold all of the relationships between users, with (NetworkID int identity PK, Owners_PersonID int FK, Friends_PersonID int FK, etc). Or conversely, remove the NetworkID, and have both the Owners_PersonID and Friends_PersonID shared as the Primary key.

This is where the DBA has his problem. Saying that "he would only implement this kind of architecture in a data warehousing schema, and not for a website, and this is just another example of web developers trying to take the easy way out."

Now obviously, his remark was a bit inflammatory, and that have helped motivate me to find an suitable answer, but more so, I'd just like to know how to do it right. I've been developing databases and programming for over 10 years, have worked with some top-notch minds, and have never heard this kind of argument.

What the DBA is wanting to do is instead of storing both the Owners_PersonId and Friends_PersonId in the same table, is to create a third table tblFriends to store the Friends_PersonId, and have the tblNetwork have (NetworkID int identity PK, Owner_PersonID int FK, FriendsID int FK(from TBLFriends)). All that tblFriends would house would be (FriendsID int identity PK, Friends_PersonID(related back to Persons)).

To me, creating the third table is just excessive in nature, and does nothing but create an alias for the Friends_PersonID, and cause me to have to add (what I view as unneeded) joins to all my queries, not to mention the extra cycles that will be necessary to perform the join on every query.

I understand that technically, what he is wanting is possible, but is it inline with best practice? What would be best practice?

Thanks for reading, appreciate comments.

Ryan

+3  A: 

The only schema that makes sense to me is this:

Person
    PersonID Int PK

Friend
    PersonID Int PK FK
    OtherPersonID Int PK FK

So you might have a procedure called FriendList that executes this nice clean query:

Select Person.*
From Friend
    Inner Join Person On Friend.OtherPersonID = Person.PersonID
Where Friend.PersonID = @PersonID;

I do not condone selecting all columns.

ChaosPandion
So how would you store the owner of the friend list?
Andomar
@Andomar - Not quite sure I follow.
ChaosPandion
Each network has a `Owner_PersonID`. If I understand the question right, the main difference between the DBA and the poster's solution is where to store that information
Andomar
Right, and that's how I'm looking at it; which is probably where the comment about doing it the "easy" way generated, lol.Basically, he hasn't provided anything concrete regarding his position, but since he's their fulltime dba, his word is final. I just want to know if I'm wrong in wanting a simple two table design. Or if there really is some logical reason for his want of three tables. I kind of feel like this is one of those secret tests employers give to see if employees really know what you're talking about.
devRyan
+1  A: 

What I am wanting to do is to create a second table, tblNetwork, that will hold all of the relationships between users, with (NetworkID int identity PK, Owners_PersonID int FK, Friends_PersonID int FK, etc). Or conversely, remove the NetworkID, and have both the Owners_PersonID and Friends_PersonID shared as the Primary key.

I don't see any problem with this. And I agree that the NetworkID is superfluous -- the two FKs are the natural key for the table, and so you should just use them as the primary key, unless you have some performance reason why you need to refer to specific relationships by a surrogate ID (which you don't seem to have in this case).

Daniel Pryden
A: 

I say do it your way. Having that third table makes the programming part more of a pain.

Gromer
+7  A: 

If I understand you right, You're proposing:

Person              PersonID PK
FriendList          FriendListID, OwnerID, PersonID 

The DBA proposes:

Person              PersonID PK
FriendList          FriendListID, OwnerID
FriendListEntry     FriendListID, PersonID

Your approach would require multiple rows for each friend in the list. This would repeat OwnerID multiple times, violating normal form. The DBA's solution is more normalized, having only values that depend on FriendListID in the FriendList table.

The best practice here is to be good friends with the DBA. I'd go with his solution because it doesn't matter much, and you're sure to need him later on.

Andomar
I fail to see how adding a third table makes the schema more normalized.
ChaosPandion
@ChaosPandion: The two-table solution violates second normal form (http://en.wikipedia.org/wiki/Second_normal_form) by repeating OwnerID for each friend in the list
Andomar
I've updated the OP with shorthand of the tables, for clarification. Edit-which is exactly as Andomar listed above.
devRyan
Please explain why this does not apply to `FriendListID`.
ChaosPandion
@ChaosPandion: It does not apply to FriendListID because FriendListID is a foreign key. The difference becomes clearer if there was more information tied to a FriendList, for example a name, create date, type (Work/Private), and privacy information. The two-table approach would repeat all of that
Andomar
Well I'll be damned you convinced me. My guess is the DBA is just an arrogant jerk. He should have explained his reasoning.
ChaosPandion
@Ryan: What you have in your post does not appear to be identical to what Andomar has above. In your post you have FriendID as the primary key of the "Friend" table, meaning you would have a new FriendID for each entry and thus a new row in the "Network" table for each friend. In Andomar's example FriendListID is primary in what would be your "Network" table, but foreign in what would be your "Friend" table.
Robert
@Robert - You are correct, I misread what he had there. What I've posted at the top of the OP is what the DBA proposes. Sorry for the confusion.
devRyan
@Ryan: Sounds like Andomar's design is what the DBA should be proposing anyway.
Robert
@Robert - Agreed
devRyan
+2  A: 

Your design violates Third Normal Form, if Network.Owners_PersonID is stored redundantly for a network.

But I don't understand how the DBA's design actually helps. I would have expected Friends to be the many-to-many table between Users and Networks:

CREATE TABLE tblUsers (
  PersonID INT IDENTITY PRIMARY KEY
);

CREATE TABLE tblNetworks (
  NetworkID INT IDENTITY PRIMARY KEY,
  Owner_PersonID INT NOT NULL REFERENCES tblUsers
);

CREATE TABLE tblFriends (
  NetworkID INT NOT NULL REFERENCES tblNetworks,
  FriendID INT NOT NULL REFERENCES tblUsers,
  PRIMARY KEY(NetworkID, FriendID)
);

In other words, you have a simple many-to-many relationship:

Users ----<- Friends ->---- Networks

And additionally, Networks references Users just to identify the owner of the given network. This way there's only one row for a given network, so you can't create an update anomaly by changing the owner of the network on some rows.

I don't think this is splitting the entities into separate tables excessively. You can still get a list of friends for a given network:

SELECT ... FROM Networks n JOIN Friends f ON (n.NetworkID=f.NetworkID)

You can get all of a user's friends from all networks this way (pass the given user's id for the ? parameter):

SELECT ... FROM Friends u 
JOIN Friends f ON (u.NetworkID=f.NetworkID)
WHERE u.UserID = ?

In your original design, it's pretty much the same:

SELECT ... FROM Networks u
JOIN Networks f ON (u.Owner_UserID=f.Owner_UserID)
WHERE u.FriendID = ?

But the advantage is that you've eliminated the possible update anomaly.

Bill Karwin
See, I can work with what you've got above. I actually prefer it.It makes me wonder if the DBA may have just "typo'd" the schema, and has been arguing with me not realizing what he's got on paper.
devRyan
Have you had a face-to-face discussion about this database design, and done some sketching on a whiteboard or something? If you're only communicating by email, that's pretty inefficient and prone to misunderstandings.
Bill Karwin
We've been working via WebEx.
devRyan
A bit of nitpicking: repeating groups violate **second** normal form. Third normal form requires that each element only depends on the key
Andomar
@Andomar: Hmm. I didn't think this was a repeating-groups issue, it's an update-anomaly issue.
Bill Karwin
Well as I understand it the update anomaly comes with violating second normal form?
Andomar
But second normal form has to do with compound keys. Go look at the example scenario on the wikipedia page for Third Normal Form. It's pretty similar to this scenario with the OP's original design for the Networks table.
Bill Karwin