Background
I'm having trouble understanding the primary key choices in the database for a major vendor supplied product that we support and write SQL reports on. Some irrelevant details have been changed.
Each major table has a unique autonumber "internal ID" field, but this field is not always used as the primary key.
My question
In particular, I'm confused about the primary key selection for the order and order line tables:
- Organization PK (
organization_internal_id
) - Order table PK
(organization_internal_id, order_internal_id)
- Order line PK
(order_internal_id, organization_internal_id)
The order and order line tables each have an internal_id
that also uniquely identifies their rows. Every table uses the primary key for the clustered index. There are nonclustered indexes for the internal IDs alone.
Why not make the internal IDs the primary keys but set up a separate unique clustered index for each (as it is now)?
One possibility
One thing I can think of is that the designers don't know the difference between the primary key and a clustered index, and so are using a weird PK to get the clustered indexes they want. To be fair, order lines are often accessed in reference to an order, and there aren't details below the line level to reference the order_line_internal_id
.
Query efficiency
One issue with having the order's clustered index like it is that it encourages the QO to use a scan and hash join when joining many orders and order lines together (an unfortunately all too common occurrence, even if I only use 5% of the data in the end). There are too many rows selected to loop-seek into the order line table (although forcing this does sometimes help), and making the organization the first part of the order key prevents a merge join from working without a sort.
More details
- These internal IDs are used only for joining to other tables; there are separate external IDs for references that go outside the system or for display to users. The designers love these autonumber artificial keys.
- The DB is on MS SQL Server 2000
- I think the vendor used to support Oracle as the DB
- The order table has 1M rows ~5MB
- The order line table has 30M rows ~1GB
- The entire DB is ~100GB