views:

83

answers:

3

Background

I'm having trouble understanding the primary key choices in the database for a major vendor supplied product that we support and write SQL reports on. Some irrelevant details have been changed.

Each major table has a unique autonumber "internal ID" field, but this field is not always used as the primary key.

My question

In particular, I'm confused about the primary key selection for the order and order line tables:

  • Organization PK (organization_internal_id)
  • Order table PK (organization_internal_id, order_internal_id)
  • Order line PK (order_internal_id, organization_internal_id)

The order and order line tables each have an internal_id that also uniquely identifies their rows. Every table uses the primary key for the clustered index. There are nonclustered indexes for the internal IDs alone.

Why not make the internal IDs the primary keys but set up a separate unique clustered index for each (as it is now)?

One possibility

One thing I can think of is that the designers don't know the difference between the primary key and a clustered index, and so are using a weird PK to get the clustered indexes they want. To be fair, order lines are often accessed in reference to an order, and there aren't details below the line level to reference the order_line_internal_id.

Query efficiency

One issue with having the order's clustered index like it is that it encourages the QO to use a scan and hash join when joining many orders and order lines together (an unfortunately all too common occurrence, even if I only use 5% of the data in the end). There are too many rows selected to loop-seek into the order line table (although forcing this does sometimes help), and making the organization the first part of the order key prevents a merge join from working without a sort.

More details

  • These internal IDs are used only for joining to other tables; there are separate external IDs for references that go outside the system or for display to users. The designers love these autonumber artificial keys.
  • The DB is on MS SQL Server 2000
  • I think the vendor used to support Oracle as the DB
  • The order table has 1M rows ~5MB
  • The order line table has 30M rows ~1GB
  • The entire DB is ~100GB
+1  A: 

I think your "one possibility" is pretty likely. Apparently in some bygone age of database design, primary keys were the only unique indexes available; the material cited in this question about one-to-many relationship implementation seems to hail from such a time, for example. Seems reasonable to think that some people trained in that era would tend to retain its habits.

chaos
+1  A: 

Assuming that there is an apparent rational reason behind it - and it could quite easily be some legacy issue dating back to an archaic version of the software that has never been changed but is now unfathomable - I'd suggest that the most likely explanation is that the designers considered that access to the tables would usually be via the fields in the primary keys they used and hence they thought this would be drive the most efficient queries. This might have been particularly appealing to them if, for example on the Order Table they thought that most of the time they would only require the (organization-internal-id, order-internal-id) fields to be retrieved, hence by putting these into the primary key that would avoid a table read.

The logic would have run something like "we know we usually only need need organization-internal-id and order-internal-id from this table, so if we use this as the PK the data will be retrieved directly rather than forcing access to the actual row data is we were to access via the arbitrary internal-id"

Which may or may not be true, but once these decisions have been made in this kind of software they tend to stick around unless the reason to change them is compelling.

I have vague memories of seeing a similar construction in Oracle Accounting IV a very long time ago, so possibly it comes over from some older Oracle methodology (Oracle 6 was rather sensitive about these things)

Cruachan
+2  A: 

Also, consider that a the internal ID may be a requirement for an ORM employed in this company's product accompanying this database.

CMB