ansaurus

Question

What is the optimal indexing strategy for a relation table?

Answer 1

+1 A:

I guess solution 2 is optimal. I'd choose the order of the clustered index by looking at the values and expecting which one has more distinct rows. That one goes first. Also it's important to have unique or primary key indexes on parent tables.

Depending on DBMS, number 3 might work as good as number 2. It might or might not be smart enough to consider the values (key of clustered index) in the nonclustered index for anything other than refering the the actual row. If it can use it, then number 3 would be better.

Mehrdad Afshari 2009-05-12 09:24:13

But *which one* is optimal? 1., 2. or 3.? :) Also, the question is more a general how-to than a specific problem.

Tomalak 2009-05-12 09:25:39

Tomalak: I thought you mean all of them :) Oh, I didn't see #1 + ... I'd choose number 2.

Mehrdad Afshari 2009-05-12 09:26:55

Hm, maybe I have to reformat the question to make that more clear. :)

Tomalak 2009-05-12 09:28:55

Answer 2

+3 A:

I made some tests, and here is the update:

To cover all possible cases, you'll need to have:

CLUSTERED INDEX (a, b)
INDEX (b)

This will cover all JOIN sutiations AND ORDER BY

Note that an index on B is actually sorted on (B, A) since it references clustered rows.

As long as your a and b tables have PRIMARY KEY's on id's, you don't need to create additional indexes to handle ORDER BY ASC, DESC.

See the entry in my blog for more details:

Indexing a link table

Quassnoi 2009-05-12 09:24:17

You still need an index on (B DESC) to use an index for ORDER BY b, a DESC

Quassnoi 2009-05-12 09:44:21

An INDEX (b, a) does not have a benefit?

Tomalak 2009-05-12 09:47:48

INDEX (b) actually is INDEX(b, a, b), since your table is clustered. An index on (b, a) would actually be INDEX (b, a, a, b) which is totally pointless.

Quassnoi 2009-05-12 09:51:37

I see. Thanks for clarifying. I was just thinking along the lines of "if all required data for a given query can be pulled from a singe index, the DB server won't have to touch the table or other indexes". Does that make sense?

Tomalak 2009-05-12 09:58:25

If AB does not contain any other field but A and B, then all indexes I described (including ones on B) contain ALL data the table itself contains. The indexes contain a key and a pointer to the row in the table. Since yor table is clustered, the pointer to the row will be an (A, B) pair and it will be containted in ALL indexes.

Quassnoi 2009-05-12 10:03:39

@Quassnoi, would that be any different if the AtoB table had additionally identity type key AtoBid like table AtoB (AtoBid,Aid, Bid)?

kristof 2009-05-12 10:26:24

@kristof: no, it would only hamper performance. In case of many-to-many tables, the linked fields make a perfect natural PRIMARY KEY which can be clustered.

Quassnoi 2009-05-12 10:28:59

Thanks Quassnoi, what about the scenario when you want to reference the AtoB table from another one?

kristof 2009-05-12 10:36:20

@kristof: just make a reference by A and B.

Quassnoi 2009-05-12 10:46:42

thanks, that makes sense. I would be grateful if you covered that scenario in the blog post that you mentioned, as "why it would not make sense to add an extra key, and what would be the performance drawbacks if someone used that approach". Thanks again, I find all your answers a great learning resource especially for using indexes I wish I had more time to follow them more closely

kristof 2009-05-12 11:00:08

@Quassnoi: Thank you. I really appreciate all the effort you put into this. Very nice blog entry. :-) I had left this open for a while hoping this would catch some more upvotes, better reflecting the time you put into the answer. But alas. The niche/non-obvious questions/answers simply do not attract enough attention. Thanks again.

Tomalak 2009-05-13 17:26:55

Answer 3

+1 A:

I have done some quick and dirty tests by examining the execution plans in SQL server 2005. The plans showed that SQL uses the clustered index on Aid,Bid for most queries. Adding an index on Bid (ASC) shows that it's used for queries of type

select * from A
    inner join AtoB on Aid = A.id
    inner join B on Bid = B.id
where Bid = 1

So I'm voting for solution #3.

edosoft 2009-05-12 10:01:46

You anticipated what @Quassnoi took the time to track down and explain. +1 in any case. :)

Tomalak 2009-05-13 17:28:49

ansaurus

tags:

views:

answers:

What is the optimal indexing strategy for a relation table?

related questions