views:

44

answers:

3

Lets say I have a Users table with a UserID column and I need to model a scenario where a user can have multiple relationships with another user, e.g. phone calls. Any user can initiate 0...n phone calls with another user.

Would it be a classic junction table like:

UserCalls
-----------------------
| CallerID | CalleeID |
-----------------------

?

A: 

Yes, that's correct.

Stephen Burns
+3  A: 

Sounds like you're on the right track...

CREATE TABLE CallHistory
(
    CallerID   int,
    RecipientID   int,
    DurationInMinutes int,
    /*  etc  etc  */
    CallStartedAt    smalldatetime
)

For your PK, consider this article on choosing a PK: http://www.agiledata.org/essays/keys.html

p.campbell
+1: I was struggling with the naming convention, I only had "recipientod". Callhistory is the perfect name for the table.
OMG Ponies
I don't understand why you showing the create table, please elaborate
VoodooChild
@VoodooChild - He's showing the create table because it's the best way to represent the schema. He could do it as I did but (a) my representation is lamer and (b) my example is harder to format. :)
Howiecamp
@p.campbell - Thanks. Let's say I need 2 queries. The first is to show all RecipientID's for a given CallerID and the second is the other way around. The first query is obvious - I'd simply query CallHistory WHERE CallerID = id_i_care_about. But in the reverse case, am I simply querying CallHistory WHERE RecipientID = the_id_i_care_about? This is probably a stupid question but something seems odd to me about querying the second column. Not sure why.
Howiecamp
@Howiecamp The only thing that's odd is you would typically require two indexes to support that kind of operation
Cade Roux
@howiecamp: right, you got it. Also, if you had the requirement to run stats off this table, and the right indexes, you'd be set. Choosing a primary key would be the next step. Natural key vs. an incrementing int.
p.campbell
+3  A: 

The thing that's really important to get right on these tables is the primary key. If a person is allowed to call another person several times and each is represented by a distinct row, then (Caller, Callee) is not a candidate key. There needs to be something like a surrogate key or some kind of timestamp which is used to ensure you have a good primary key.

In addition, from a business rules perspective, if the relationship is reversible where any time you are looking for calls, you only care that the two parties were the same (not who called who), having the table distinguish them in the way you have can be problematic. The typical way around that is to have a Calls table and a CallParties table which links the call to the parties in the call (which may have flags which help identify the call originator). In this way the column order dependency goes away and MAY make certain queries easier (it may make others more difficult). This can also reduce the number of indexes required.

So, I would consider first the table design as you have it, but also keep in mind the possible need for reversals.

Cade Roux
Good call on the CallParties table. This sort of thing is always a trade-off. Just tracing dependencies in either direction becomes MUCH easier with such a table, but then regular "look up the people someone called" queries end up requiring 4 joins between 5 tables rather than 2 joins between 3 tables: `caller join call join callparty join call join caller` vs `caller join callhistory join caller`.
Emtucifor
@Cade - I do indeed want to know who called who (not just that the two parties are the same). In this case are you recommending I implement the Calls and CallParties approach? It looks like you're saying these 2 tables should be implemented when I *don't* care about who called who.
Howiecamp
@Howiecamp You can still tell who called, by adding a flag to the CallParties table (CallID, PartyID, IsOriginator), this is obviously an attribute of a Party's connection to the call, so it makes sense. But, as Emtucifor says, the joins can be more complex. That does NOT mean they WILL perform poorly, because, properly indexed, a normalized design can perform extremely well on a variety of unforeseen use-cases. But it could cause problems. I would try to benchmark the queries which are awkward in the CallerID, CalleeID model and see how slow, unmaintainable they are first.
Cade Roux
@Howiecamp Because you may simply be fine with a few extra UNIONs or whatever it takes to combine the callers and callees, and that can always be wrapped in a view or something which conforms all the data to some canonical form where CallerID1 < CallerID2 or whatever you need to support your use cases.
Cade Roux