views:

50

answers:

2

We have two tables, ActivityForm and Field which are given a many-to-many relationship via the ActivityFormField table. The ActivityFormFieldValidator table will have a many-to-one relationship with the ActivityFormField table, so we are giving the ActivityFormField table an identity column ("ActivityFormFieldId"). The ActivityFormField table also contains information like the order in which the given field should be displayed on the given activity form.

Expected usage is as follows:

  • There will eventually be a lot of activity forms and fields, but each activity form will only have a handful of fields attached to it (maybe a dozen, on average).
  • The most common query on this table will be to retrieve all of the entries in ActivityFormField table with a given ActivityFormId, and join that data with the Field and ActivityFormFieldValidator tables.
  • New ActivityFormField entries will be created every so often, but not nearly as often as the query mentioned above.

My initial reaction is to configure the ActivityFormField table with the following keys and indexes:

  • Primary Key on the identity column, with a Clustered Index on it.
  • A Unique, nonclustered index that includes the FieldId and ActivityFormId columns.

In addition to feeling the most "right" or "normal" to me, this satisfies the basic requirements of:

  • making sure there is only one ActivityFormField entry for any combination of ActivityForm and Field,
  • providing an indexed key for ActivityFormFieldValidator entries to attach to, and
  • providing an indexed key on ActivityFormId (for the query mentioned earlier)

It also "does no harm" with its clustered index, in the sense that insertion of new ActivityFormField entries shouldn't cause any reordering of the actual data.

However, given the way that we are planning to use the table, I wonder if there's another combination of keys and indexes and clustering that is likely to be much more efficient?

Update

Having done some additional research and thinking, I'm leaning toward making my table declaration look something like this:

CREATE TABLE [ActivityFormField](
    [ActivityFormFieldId] [int] IDENTITY(1,1) NOT NULL,
    [ActivityFormId] [int] NOT NULL,
    [FieldId] [int] NOT NULL,
    [SortOrder] [tinyint] NOT NULL,
    CONSTRAINT [PK_ActivityFormField] PRIMARY KEY NONCLUSTERED 
    (
     [ActivityFormFieldId] ASC
    ),
    CONSTRAINT [UK_ActivityFormField_ActivityForm_Field] UNIQUE NONCLUSTERED
    (
     [ActivityFormId] ASC,
     [FieldId] ASC
    )
)

ALTER TABLE [ActivityFormField] WITH CHECK ADD CONSTRAINT [FK_ActivityFormField_ActivityForm] FOREIGN KEY([ActivityFormId])
REFERENCES [ActivityForm] ([ActivityFormId])

ALTER TABLE [ActivityFormField] WITH CHECK ADD CONSTRAINT [FK_ActivityFormField_Field] FOREIGN KEY([FieldId])
REFERENCES [Field] ([FieldId])

CREATE UNIQUE CLUSTERED INDEX IX_ActivityFormField_ActivityForm_SortOrder ON [ActivityFormField] 
(
 [ActivityFormId] ASC,
 [SortOrder] ASC
)

CREATE NONCLUSTERED INDEX IX_ActivityFormField_ActivityForm ON [ActivityFormField] 
    (
     [ActivityFormId]
    )

CREATE NONCLUSTERED INDEX IX_ActivityFormField_Field ON [ActivityFormField] 
    (
     [FieldId]
    )

This approach uses a non-clustered index on the identity field, a clustered index for the criteria and order in which the most commonly-used query will retrieve the data, and a constraint to ensure that only one ActivityFormField entry exists for any combination of ActivityForm and Field. Unique Constraints and Indexes are apparently handled the same way internally, so I chose a constraint to show that its primary purpose is to keep things unique. The other indexes are there "just in case," following the logic HLGEM proposed.

A: 

Each field that is a separate fk to another table should be in an index all by itself. Unlike PKs these indexes are not created automatically and they will almost always be needed.

If you have a unique combination other than the PK (and you should if using a surrogate Pk in most tables), by all means put a unique index on the combination of the fields, to fail to do so would be to create data integrity problems. Other indexes may be needed depending on what where clauses you will use in querying the table. Index needs may change as the table size grows, so there is no need to feel you have to get all the indexes right the first time.

HLGEM
So are you recommending that I create an index on the ActivityFormId column, another one for the FieldId column, and then a third one for the combination of the two? Can you elaborate on the advantages of this approach? If I have one index for each of those two columns, is there any difference between using a unique constraint and using a unique index for the combination of the two? Thanks.
StriplingWarrior
A: 

Unfortuantly there isn't one simple answer. I would start with what you've suggested yourself, but then I'd load up the tables with appropriate quantities of data and kick them through profiler and look at the generated execution plan. Benchmarking really is the only way to work it out.

If you get index seeks, then all is good: no more work really needed. If you have index scans, then see if the indexes can be tweaked to turn them into seeks. If you see any bookmark lookups in the execution plan, consider having other columns into the index so the index itself covers the query.

Chris J