views:

218

answers:

7

Assume that we have two tables: Roles and Reports. And there exists a many-to-many relationship between them. Of course, the only solution that comes to my mind is to create a cross-table, let's name it RoleReport. I can see two approaches to the structure of that table:

1. Columns: RoleReportId, RoleId, ReportId
   PK: RoleReportId
2. Columns: RoleId, ReportId
   PK: RoleId, ReportId

Is there any real difference between them (performance or whatever else)?

+5  A: 

You don't actually need the RoleReportId. It adds nothing to the relationship.

Many people try to avoid using a naturally-unique key in real tables, instead opting for an artificially unique one, but I don't always agree with that. For example, if you can be sure that your SSN will never change, you can use that as a key. If it somehow does change in the future, you can fix it then.

But I don't intend arguing that point, there's good arguments on both sides. However, you certainly don't need an artificially unique key in this case since both your other fields are, and will remain, unique.

paxdiablo
why not to create artificial data to use in data relationships? this way you don't need to think: RoleReportId will NEVER change
Rubens Farias
But what about performance? Using the first option cause that I've a PK on one filed, second option - on two fields. Is it significant?
rafek
@rafek: re perf: try it and find out. Database optimizers can be very smart, or not, depending on all kinds of stuff that isn't in scope of this discussion.
Greg D
@rafek, it shouldn't be. An index is simply an ordered list of the underlying data. Whether that data is from one field or two (contiguous) fields won't really matter, *especially* since, in a normal database, reads far outweigh writes.
paxdiablo
@rafek: You'll want to have an index anyway to ensure that each (RoleId, ReportId) combination is unique -- so this point is moot.
Martin B
@Rubens, I'm assuming that both roleid and reportid are already artificially unique keys, since the author is one of those that seems to like them :-) In that case, you're just wasting space since if neither of those will either change, you don't need another artificially unique key.
paxdiablo
+2  A: 

Unless you really need the RoleReportId as a foreign key in some other table (which is not usually going to be the case), go with option 2. It's going to require less storage, and that by itself will probably give a performance advantage -- plus why have a column you're never going to use?

Martin B
+2  A: 

Semantically, the difference is what you're using as the primary key.

Typically I let the remainder of my schema dictate what I do in this situation. If the cross-table is exclusively the implementation of the many-to-many relationship, I tend to use the concatenated primary key. If I'm hanging more information off the cross table, making it an entity in its own right, I'm more inclined to give it its own id independent of the two tables it's connecting.

This is, of course, subjective. I don't claim that this is the One True Way (tm).

Greg D
A: 

I would suggest du choose no PK for your second choice. You may use indices or an unique constraint over the combination of both columns.

FooBarWriter
+6  A: 

You will need a composite UNIQUE index on (RoleId, ReportId) anyway.

There is no point in not doing it a PRIMARY KEY.

If you do it a CLUSTERED PRIMARY KEY (which is default), this will be better performance-wise, since it will be less in size.

A clustered primary key will contain only two columns in each record: RoleID and ReportID, while a secondary index will contain three columns: RoleID, ReportID and RoleReportID (as a row pointer).

You may want to create an additional index on ReportID which may be used to search all Roles for a given Report.

There would be some point in making a surrogate key for this relationship if the two following conditions held:

  1. You have additional attributes in your relationship (i. e. this table contains additional columns, like Date or anything else)
  2. You have lots of tables that refer to this relationship with a FOREIGN KEY

In this case it would be nicer to have a single-column PRIMARY KEY to refer to in FOREIGN KEY relationships.

Since you don't seem to have this need, just make a composite PRIMARY KEY.

Quassnoi
Your answer exceeds my expectations. Thanks a lot! :)
rafek
+1  A: 

If you have many rows, then it might be beneficial to have appropriately ordered indexes on your RoleId and/or ReportId columns, since this will speed up look up operations - but inversely this will slow down insert/delete operations. This is a classic usage profile issue...

If not required otherwise, omit the RoleReportId PK. It adds nothing to the relationship, forces the Server to generate a useless number on each insert, and leaves the other two columns unordered, which slows down lookups.

But all in all, we are talking about milliseconds here. This only becomes relevant, if there is a huge amount of data (say more than 10.000 rows)...

Thomas Weller
A: 

The benefit of using RoleReportID as a single-column primary key comes when you (or the other guy, depending on the structure of your company) need to write a front end that addresses individual role<->report relationships (for instance, to delete one). At that point, you may prefer the fact that you need to address only one column, instead of two, to identify the linking record.

Other than that, you don't need the RoleReportID column.

Larry Lustig