views:

41

answers:

2

Does cardinality play a role in composite indexes? If so, what?

I was running a query that was joining on two columns and it used what I thought to be a sup-optimal index, so it's making me rethink how I design indexes...

Let's say we had a table that listed all of the Cities in the United States. My first instinct here is to make a clustered index on (State -> City) so that if we ever need to query all of the Cities for one State, it would probably target that index. In addition, it would be a great index for queries that specify both City and State (here we can assume that City,State is a unique pair).

I ran into a query that is essentially joining off of a table that gave a list of Special Cities. So this is a subset of the Cities table. I'm specifying the join on Special.City and Special.State, but what surprised me here is that it used the primary key index (automatically created by SQL server) of the Cities table instead of the clustered index I made. How come?

I've also heard that good indexes have high cardinality...

So I'm wondering if the clustered index (or another, separate index) should have been created (City -> State) (notice the difference in order) because (we assume) just City has high cardinality and is way more discriminating than State is in the first series of buckets.

It's been my rule of thumb to always create clustered indexes on parent->child in parent-child relationships, like in Cities and States, to benefit queries that target specific children and queries that fetch all children for a given parent. Do I need to rethink something here?

Informal testing shows that an index for (City -> State) was marginally cheaper than the PK index.

A: 

The kind of indexing you are dealing with (as opposed to munchkin PKs on identity surrogate keys) can be a can of worms. One could write on the subject for hours, and not necessarily say anything that would help you in your situation. Reading articles on indexing and doing lots of experimentation might be your best bet.

Not much help, alas. I may update this later, if I can think of any concise universal truisms.

Philip Kelley
+1  A: 

Some thoughts:

  • When you joined on "Special Cities", did you ask for all columns or just City and State from "City"? That is, what it covered

  • What is the index or PK order for "Special Cities"?

  • Did you filter anywhere?

The cardinality of the column can play a part though: see Craig Freedman's blog entry for residual lookups. And another one.

And it's mentioned in BOL (can't find it though) that it should be most selective first

However, this falls apart where you use several layers of tables and composite keys. Example:

  • table Grandad (GrandDadID, ...)
  • table Dad (GrandDadID, DadID, ...)
  • table Son (GrandDadID, DadID, SonID, ...)

The PK for Son covers the FK needs for both parent tables.

If you reverse that order for "Dad" and "Son" because DadID and SonID should be selective then GrandDadID, then you suddenly need a lot more indexes to cover queries and FK DRI.

So: column cardinality plays a part but it's just one factor and, er, "it depends"...

gbn
This is hilarious! That guy has an entire blog dedicated to just indexing? That's crazy. Many thanks!
Mark Canlas
@Mark: not just indexes. Read what he did: http://blogs.msdn.com/craigfr/about.aspx
gbn
Also in your example, isn't Son.GrandDadID implicit to the relationship given DadID?
Mark Canlas
@Mark: yes, I didn't want to labour the point too much
gbn