views:

70

answers:

3

I'm upsizing a Jet database to SQL Server Express 2008 R2 and before doing so, I'm re-evaluating the schema (it was designed in 1997-98, and the guy who designed it (i.e., me) was something of a moron!).

My question is about N:N join tables with a two-column composite key. In Jet, joins on the first column of a two-column composite key will use the composite index, but joins on the second column will not, so in general, in Jet databases with large N:N join tables with reasonably large numbers of records, in addition to the composite index I add a second, non-unique index on the second column.

Is this a good idea in SQL Server?

(Maybe it's not a good idea in Jet?)

+5  A: 

The same rules apply in SQL Server. If you have an index on (ColumnA, ColumnB) a query on only ColumnA or ColumnA and ColumnB together can use the index, but a query on only ColumnB cannot. If there is a need to join on just ColumnB, then you should definitely create the index.

Joe Stefanelli
Any downside, other than the obvious index maintenance issues?
David-W-Fenton
Space considerations for storing the index. Additional overhead on Insert/Update/Delete operations. Also consider the cardinality of ColumnB. If it's low cardinality (few unique values), the index may not help as much.
Joe Stefanelli
In general, my ColumnB has lower cardinality that ColumnA. Would reversing them and adding the non-duplicate index on the column with the higher cardinality be more efficient?
David-W-Fenton
If B has lower cardinality then your original (ColumnA, ColumnB) index is in the correct order. You just need to decide if an independent index on (ColumnB) would be beneficial in your specific situation.
Joe Stefanelli
I'm accepting this as the answer, even though the other answers were quite helpful, too. They were just less direct, and prompted lots of other thoughts.
David-W-Fenton
+2  A: 

If you have a composite index on columns (A,B) no seek, range scan, sort or aggregate operation can use it for expressions that contain only B. This is true for SQL Server, just as it was true for Jet (Red) drivers (and I think for Jet Blue also). Some other engines might use it in a so called skip scan operation.

So the answer is that you need separate indexes on (B) alone.

Remus Rusanu
If the table is mostly used with joins on both columns, will it help that SELECT as much as it would in a SELECT using only the 2nd column in a join? My thinking is that the optimizer is going to select first on the side that has an index and then has less to do on the other side, so with both joins it's going to be more efficient than with the join only on the unindexed side. Since I hardly ever use join tables without both joins, maybe it's not going to help much?
David-W-Fenton
If the join is on both columns then the index on `(B)` alone will likely be ignored. But if you search/join mostly on `A` and `B`, *sometimes* on `B` alone yet *never* on `A` alone then you could consider changing the index to be on `(B,A)` instead.
Remus Rusanu
As a general rule, the column order in an index should be in increasing selectivity order, unless specific `ORDER BY` needs to be addressed. Putting low selectivity columns on the leftmost side allows for aggregate and large range queries to still use the index, but be warned that a low selectivity column will also quickly fall into the tipping point trap: http://www.sqlskills.com/BLOGS/KIMBERLY/category/The-Tipping-Point.aspx. Having low selectivity column (like `Type`) on leftmost makes most sense on clustered indexes because clustered indexes are always covering, don't have a tipping point.
Remus Rusanu
When I say "join on both columns" I mean functioning as N:N join table, with ColumnA joined to TableA and ColumnB joined to TableB. I don't ever use multi-column join table composite keys as FK in another table, so I never have join on ColumnA+ColumnB to TableC.
David-W-Fenton
There's never an ORDER by on these composite keys -- they are N:N join tables, and the ordering has no meaning (the columns are FK values from the Autonumber surrogate keys of the joined tables).
David-W-Fenton
If you have a narrow table that acts as a N:N join (has only two columns, one is FK on A and one is FK on B) then I'd recommend having two indexes, one on `(A,B)` and one on `(B,A)` (one of the indexes will be the clustered index). Having just an index on `B` alone will not be a covering index. Having an index on `(B) INCLUDE (A)` probably doesn't justify the savings vs. the loss of secondary order on `A` (might be leveraged by aggregate queries and many more even w/o an `ORDER BY`).
Remus Rusanu
Sorry, my familiarity with terms if spotty -- what do you mean "index on (B) INCLUDE (A)"? Two composite indexes seem to me to make no sense, since I never use even the original composite index for a join, except insofar as it gets used for joins using ColumnA. As I said, the composite key is not stored as FK in any tables, so I'm not seeing what advantage there could be to having two composite keys. I guess once I do the upsizing (I'm running SSMA to see what warnings pop up), I should test as @Luka suggests.
David-W-Fenton
`INCLUDE`: see http://msdn.microsoft.com/en-us/library/ms190806.aspx. When the table N:N is looked up the query is always going to need both `A` and `B` values. A secondary index on `B` alone will cause the query not to have the needed `A` value and will need to do a bookmark lookup in the clustered index. This is why an `INCLUDE (A)` is needed. But the same effect can be obtained by having the index on `(B,A)`, at the cost of adding the `A` storage to all non-leaf pages, but this is likely outweigh by the benefit of having `A` values ordered in the secondary index.
Remus Rusanu
@remus, if the clustered index is on `(A,B)`, wouldn't a non-clustered index on `(B)` alone necessarily be a covering index, by virtue of the clustering key? second question: are clustering keys at the leaf level of a non-clustered index equivalent to `(B) INCLUDE (A)`, or `(B,A)`? I've always assumed the former.
Peter
@Peter: you are correct, all missing keys (and uniquifier, if any) from the clustered index are always added 'hidden' to the leaf nodes in a non-clustered index, so that the entry in the Nc can be matched exactly to the clustered entry. But I always prefer tot add the *keys required by queries to become covering indexes* explicitly as INCLUDE or as secondary keys in the non-clustered definition: it makes things more clear and visible, and it won't turn the index into a non-covering one if the clustered key definition changes over time.
Remus Rusanu
@Remus, thanks, just wanted to confirm my understanding.
Peter
I'm upvoting this because I'd like to accept two answers, but can't.
David-W-Fenton
+2  A: 

To help you more, just a tip, in SQL server using the Managment studio you can evaluate the performance by "Display Estimated execution plan". It shown how the indexs and join works.

Also you can use the DTA (Database Engine Tuning Advisor) for more info and optimization.

Luka
Thanks for that!
David-W-Fenton