views:

246

answers:

3

A relation table is the common solution to representing a many-to-many (m:n) relationship.

In the simplest form, it combines foreign keys referencing the two relating tables to a new composite primary key:

A        AtoB     B
----     ----     ----
*id      *Aid     *id
data     *Bid     data

How should it be indexed to provide optimal performance in every JOIN situation?

  1. clustered index over (Aid ASC, Bid ASC) (this is mandatory anyway, I guess)
  2. option #1 plus an additional index over (Bid ASC, Aid ASC)
  3. or option #1 plus an additional index over (Bid ASC)
  4. any other options? Vendor-specific stuff, maybe?
+1  A: 

I guess solution 2 is optimal. I'd choose the order of the clustered index by looking at the values and expecting which one has more distinct rows. That one goes first. Also it's important to have unique or primary key indexes on parent tables.

Depending on DBMS, number 3 might work as good as number 2. It might or might not be smart enough to consider the values (key of clustered index) in the nonclustered index for anything other than refering the the actual row. If it can use it, then number 3 would be better.

Mehrdad Afshari
But *which one* is optimal? 1., 2. or 3.? :) Also, the question is more a general how-to than a specific problem.
Tomalak
Tomalak: I thought you mean all of them :) Oh, I didn't see #1 + ... I'd choose number 2.
Mehrdad Afshari
Hm, maybe I have to reformat the question to make that more clear. :)
Tomalak
+3  A: 

I made some tests, and here is the update:

To cover all possible cases, you'll need to have:

CLUSTERED INDEX (a, b)
INDEX (b)

This will cover all JOIN sutiations AND ORDER BY

Note that an index on B is actually sorted on (B, A) since it references clustered rows.

As long as your a and b tables have PRIMARY KEY's on id's, you don't need to create additional indexes to handle ORDER BY ASC, DESC.

See the entry in my blog for more details:

Quassnoi
You still need an index on (B DESC) to use an index for ORDER BY b, a DESC
Quassnoi
An INDEX (b, a) does not have a benefit?
Tomalak
INDEX (b) actually is INDEX(b, a, b), since your table is clustered. An index on (b, a) would actually be INDEX (b, a, a, b) which is totally pointless.
Quassnoi
I see. Thanks for clarifying. I was just thinking along the lines of "if all required data for a given query can be pulled from a singe index, the DB server won't have to touch the table or other indexes". Does that make sense?
Tomalak
If AB does not contain any other field but A and B, then all indexes I described (including ones on B) contain ALL data the table itself contains. The indexes contain a key and a pointer to the row in the table. Since yor table is clustered, the pointer to the row will be an (A, B) pair and it will be containted in ALL indexes.
Quassnoi
@Quassnoi, would that be any different if the AtoB table had additionally identity type key AtoBid like table AtoB (AtoBid,Aid, Bid)?
kristof
@kristof: no, it would only hamper performance. In case of many-to-many tables, the linked fields make a perfect natural PRIMARY KEY which can be clustered.
Quassnoi
Thanks Quassnoi, what about the scenario when you want to reference the AtoB table from another one?
kristof
@kristof: just make a reference by A and B.
Quassnoi
thanks, that makes sense. I would be grateful if you covered that scenario in the blog post that you mentioned, as "why it would not make sense to add an extra key, and what would be the performance drawbacks if someone used that approach". Thanks again, I find all your answers a great learning resource especially for using indexes I wish I had more time to follow them more closely
kristof
@Quassnoi: Thank you. I really appreciate all the effort you put into this. Very nice blog entry. :-) I had left this open for a while hoping this would catch some more upvotes, better reflecting the time you put into the answer. But alas. The niche/non-obvious questions/answers simply do not attract enough attention. Thanks again.
Tomalak
+1  A: 

I have done some quick and dirty tests by examining the execution plans in SQL server 2005. The plans showed that SQL uses the clustered index on Aid,Bid for most queries. Adding an index on Bid (ASC) shows that it's used for queries of type

select * from A
    inner join AtoB on Aid = A.id
    inner join B on Bid = B.id
where Bid = 1

So I'm voting for solution #3.

edosoft
You anticipated what @Quassnoi took the time to track down and explain. +1 in any case. :)
Tomalak