views:

1167

answers:

11

I'm wondering what a better design is for the intersection table for a many-to-many relationship.

The two approaches I am considering are:

CREATE TABLE SomeIntersection 
(
     IntersectionId UNIQUEIDENTIFIER PRIMARY KEY,
     TableAId UNIQUEIDENTIFIER REFERENCES TableA NOT NULL,
     TableBId UNIQUEIDENTIFIER REFERENCES TableB NOT NULL,
     CONSTRAINT IX_Intersection UNIQUE(TableAId, TableBId )
)

or

CREATE TABLE SomeIntersection 
(
     TableAId UNIQUEIDENTIFIER REFERENCES TableA NOT NULL,
     TableBId UNIQUEIDENTIFIER REFERENCES TableB NOT NULL,
     PRIMARY KEY(TableAId, TableBId )
)

Are there benefits to one over the other?
EDIT 2:**Please Note:** I plan to use Entity Framework to provide an API for the database. With that in mind, does one solution work better with EF than the other?

EDIT: On a related note, for a intersection table that the two columns reference the same table (example below), is there a way to make the two fields differ on a record?

CREATE TABLE SomeIntersection 
(
     ParentRecord INT REFERENCES TableA NOT NULL,
     ChildRecord INT REFERENCES TableA NOT NULL,
     PRIMARY KEY(TableAId, TableBId )
)

I want to prevent the following

ParentRecord          ChildRecord
=================================
      1                    1         --Cyclical reference!
+6  A: 

It's a topic of some debate. I prefer the first form because I consider it better to be able to look up the mapping row by a single value, and like to enforce a perhaps-foolish consistency of having a single-column primary key available in every table without fail. Others think that having that column is a silly waste of space.

We meet for a few rounds of bare-knuckle boxing at a bar in New Jersey every couple of months.

chaos
I agree that a row identity can come in quite handy, especially if there ever comes a time when you'd need a direct refrence to the correlation. +1 for the bare-knuckle boxing, first rule of DBA Fight Club...
CAbbott
@chaos@ Nicely put. I prefer the latter but still up-voted you. Mines a beer, thanks :)
onedaywhen
+7  A: 

The second version is the best for me, it avoids the creation of an extra index.

tekBlues
I've accepted this answer since it works best with Entity Framework. The junction/intersection table folds nicely into the Entities generated in the model.
phsr
You will need another index in the second version if you want to access the intersection table from Table B.
Gilbert Le Blanc
+4  A: 

What you are building is called an "Intersection".

I have a very clear memory of my database professor in school saying that an intersection relationship is nearly always an entity in its own right, and so it's normally worth allocating space for it as such. This would indicate that former is more "correct".

That said, I personally tend to prefer the latter. It really comes down to whether you will ever retrieve one of these records directly or if you'll only use the table when joining on one of the original tables.

Joel Coehoorn
you brought up a great point... I think what the professor said makes sense to me.
Sung Meister
+2  A: 

From a developer's perspective view, I prefer the former. It's easier to write and test when dealing with it.

I don't need to check for two keys to retrieve a unique record.

Sung Meister
+3  A: 

if you go with the first, just use an IDENTITY on the PK, you don't need to waste the space (disk and memory cache) with a UNIQUEIDENTIFIER .

KM
+1: good point, KM
van
+3  A: 

If you won't have any additional fields in the intersection table it really doesn't need an ID of its own, and adding one doesn't add any benefit. However, if you're going to be putting other fields into that table, and in many cases you will, it should have its own ID as the primary key.

Rule of thumb, of course, but there you go.

Tim Sullivan
Why should it have an ID because other columns are added?
Tom H.
Because it's suddenly an item in its own right, and you may end up needing to refer to it easily. No table *requires* an ID, it's just a handy thing to have. As I said, rule of thumb.
Tim Sullivan
+1  A: 

Benefits of the first one:

The ability to look up the join table by a single value. This makes some look up operations simpler on the client side.

Some ORMs (NHibernate) also require ids in each entity for persistence to work properly.

Benefits of the second one:

Simpler data model, with no need to create the additional index.

Less memory required.

Shane Fulmer
+1  A: 

The second is better. It constrains the junction box (intersection box) to not having more than one appearance of the same pair. If there are no other columns in the junction box, you won't be doing lookups of this table anyway, except by way of the two foreign keys.

Walter Mitty
The first one does as well "CONSTRAINT IX_Intersection UNIQUE(TableAId, TableBId "
Brian
A: 

Given that the TableAId-TableBId combination is unique and that the table is used solely to implemet a many-to-many relationship I would go with the second option. From a purely logical standpoint the fist implemtnation reduces to the second. From a structural perspective your databases needs to maintain both a primary key/index as well as a constraint on the first implementation wereas on the second it only needs to maitain the primary key/index.

James Conigliaro
+3  A: 

Use version-1 if your "intersection" actually IS an entity on its own, meaning:

  • it has additional properties
  • you may search for those objects (and not navigating a relation)

User version-2 if it is purely N-M relation table. In which case also make sure that:

  • you have your PK (CLUSTERED) with the first column relating to the table your search more often: for example if your tables are Person-Address, then I would assume that you would search for all-addresses-of-a-person more often then for all-people-at-this-address. So you should put your PK to include PersonID first
  • you can still have another one-column UNIQUE identifier, but just:

    1. either do not make it a PK
    2. or make it a PK, but specify NON CLUSTERED, so that you can use CLUSTERED for the UNIQUE index covering two referencing columns
    3. unless you use GUIDs by your design, you could then stick to INT IDENTITY column type

In both cases you may want to create another INDEX covering the 2 columns, but in another order, if you search from the other side of the relation often.

van
+2  A: 

Answering your second question...

You just need to add a check constraint, like this:

CREATE TABLE SomeIntersection 
(
     ParentRecord INT REFERENCES TableA NOT NULL,
     ChildRecord INT REFERENCES TableA NOT NULL,
     PRIMARY KEY(TableAId, TableBId),
     CHECK (ParentRecord <> ChildRecord)
)
emzero