views:

51

answers:

3

Many people say that I should create a friendships table(userid , friendid).

My question is:

Won't there be a double record for every friendship ? like:
12 is friend with 5
+
5 is friend with 12

Is there a way to prevent this ?

| OR |

Should I just ignore that ?

+4  A: 

There are at least two approaches you could use:

  1. Ensure that the userid is always less than the friendid.
  2. Store each relationship in both directions (i.e. store both a,b and b,a).

The second approach costs twice as much storage and is redundant but it does mean that some queries can be much simpler and will perform better. For example, consider fetching all the common friends of user 'a' and 'b' with each of the approaches:

Approach 1:

SELECT friendid
FROM
(
    SELECT friendid FROM friendships WHERE userid = 'a'
    UNION
    SELECT userid FROM friendships WHERE friendid = 'a'
) T1
JOIN
(
    SELECT friendid FROM friendships WHERE userid = 'b'
    UNION
    SELECT userid FROM friendships WHERE friendid = 'b'
) T2
ON T1.friendid = T2.friendid

Approach 2:

SELECT T1.friendid
FROM friendships T1 
JOIN friendships T2
ON T2.userid = 'b' AND T1.friendid = T2.friendid
WHERE T1.userid = 'a'
Mark Byers
+1 for ensuring a particular order of the IDs thus ensuring that every (two-way) relationship has a unique representation in the table.
Adrian Smith
Thank you SO much!
+1  A: 

I would ask: Is it possible for someone to be friends with themselves? What does 'friendship' mean - is it bidirectional or unidirectional. If bidirectional then one record will do, right? If unidirectional, then two records are needed. If someone cannot be friends with themselves, then you implement a business rule in the data entryu page (liek in a web page this would be good for javascript) where friend1 id <> friend2 id Not sure I agree with 'less than' - they just can't be equal

Joe
A: 

The context is missing, the question is too isolated, and the answers are very limited. The thread is incomplete.

First, for understanding, the presented Friend table does not exist on its own. It is a child of the User table which has an UserId PK. Both UserId and FriendId are FKs to UserId.

Since friendship is a two-way street or bi-directional (we are excluding unreciprocated lovers and stalkers here!), then a single row identifies the relation, in both directions.

There are strong implications in the naming of any column, so I would avoid (UserId, FriendId) which implies a hierarchy or a one-way relation, which is false. I suggest (Friend1, Friend2), and of course you need the PK to be ((Friend1, Friend2).

What has not been identified is, we have still not prevented duplicates ( [5,12] and [12, 5] ), the very redundancy you are seeking to prevent. We need a rule, to enforce an order, that Friend1 is always less than Friend2.

There are other relations, such as Bill of Materials or a family tree, which has Assemblies and COmponents, which require something more. Eg (AssemblyId, ComponentId) or (ParentID, ChildId) where that rule does not apply: a part can be a Component in an Assembly and also be an Assembly itself. That requires two rows. Still no duplicates.

PerformanceDBA