Does cardinality play a role in composite indexes? If so, what?
I was running a query that was joining on two columns and it used what I thought to be a sup-optimal index, so it's making me rethink how I design indexes...
Let's say we had a table that listed all of the Cities in the United States. My first instinct here is to make a clustered index on (State -> City) so that if we ever need to query all of the Cities for one State, it would probably target that index. In addition, it would be a great index for queries that specify both City and State (here we can assume that City,State is a unique pair).
I ran into a query that is essentially joining off of a table that gave a list of Special Cities. So this is a subset of the Cities table. I'm specifying the join on Special.City and Special.State, but what surprised me here is that it used the primary key index (automatically created by SQL server) of the Cities table instead of the clustered index I made. How come?
I've also heard that good indexes have high cardinality...
So I'm wondering if the clustered index (or another, separate index) should have been created (City -> State) (notice the difference in order) because (we assume) just City has high cardinality and is way more discriminating than State is in the first series of buckets.
It's been my rule of thumb to always create clustered indexes on parent->child in parent-child relationships, like in Cities and States, to benefit queries that target specific children and queries that fetch all children for a given parent. Do I need to rethink something here?
Informal testing shows that an index for (City -> State) was marginally cheaper than the PK index.