views:

78

answers:

6

Yesterday i noticed a foreign key column in a details table that linked directly to a customer table. This details table is just one join removed by a header table from the customer which already the proper foreign key to the customer and the detail, Bear with me.

[Cust] ---< [Header] ---< [Detail]
  |                          V
  |________ wtf? ____________|

ASCII db modelling Key:

                          V
 ---< = 1 to many,  and  _| also = 1 to many

When i pressed the table's designer on the issue he defended it by explaining that he'd be saving a join call by using this column...

IMO this saves a slow-typing, lazy sql writer from having to join one extra table at the price of denormalizing the schema. (Which of the normal forms is directly failed by this example?)

Even if using such a concept saved a dozen joins, is it ever worth it?

+1  A: 

Yes. Eliminating joins can have performance implications; queries that directly back a user interface running fast enough for that interface to be usable is a consideration that trumps design purity.

Though there's something to be said for maintaining a sort of partition between a well-normalized core schema and a set of summary tables, fed from the core tables, that back the UI.

chaos
But then you have to add extra code and regular checks to make sure that the customer in detail matches the one in detail - so the designer's choice which is the bigger issue. I would only use the extra field if the perfomance was measurably poor
Mark
... however... one should only employ such an **optimization** after it is proven to be necessary.
D.Shawley
That's a bit more strongly than I'd state it, because I believe that an experienced database analyst can identify some situations like this without having to rigorously test it, but in general I absolutely agree that you shouldn't do it unless you *need* to.
chaos
Agreed - yes one has to be careful but there are times when this sort of action is necessary. So the stated reason (in the original question) is bad.
Murph
Any form of premature optimisation or denormalistion should not be used in place of the appropriate indexes.
Mitch Wheat
Yeah, if he's just "saving a join" on some kind of general principle, without any reference to an identifiable performance concern, that's not good.
chaos
Yes, the value of a performance enhancement is somewhat undermined if the results it produces are, y'know, wrong.
chaos
+2  A: 

Was there an actual performance problem encountered that couldn't be solved by the addition of the appropriate index(es)?

If not, then introducing 'cycles' like that can lead to conflicting data in some situations, and I would avoid.

Mitch Wheat
No. There was no performance consideration. This was literally to keep from typing out an extra join.
Paul Sasik
Yeah, that's nonsense.
chaos
It is simply beyond me why people using a technology based on the idea of JOINing normalized tables seem to consider the JOINs a bad thing. They are a good thing, indicating properly structured data. If the issue is saving typing, simply encapsulate the JOIN in a VIEW and use the VIEW.
Larry Lustig
+1  A: 

It is a classic answer of 'it depends' - one size does not fit all and what you are looking at is the eternal balance between a pure accademic approach vs a pragmatic one. Too far in either direction can produce a bad result, so sometimes you will sacrifice accademic correctness to get something to work well.

It is not possible to determine whether this case is a premature optimisation or a valid one without knowing workloads, number of queries, how often that join would / would not be used as a result of the optimisation etc.

Andrew
It's a premature optimisation if there wasn't a measured performance problem to begin with....
Mitch Wheat
Broadly yes, although when designing a table schema, you are always going to be pushed to find any performance issue in advance since there is no code or test / instrumentation. If the designer knew the expected / modelled workloads it might not be so premature, but as I wrote, without knowing them or what the designer knows that we do not, I am loathe to be perscriptive and 100% certain of it being premature.
Andrew
+1  A: 

Even if using such a concept saved a dozen joins, is it ever worth it?

The correct answer from the database designer/data modeller would be that there are situations where a CUSTOMER record can be relate to a DETAIL record without a supporting HEADER record, per business rules.

Adding foreign keys for the sake of it sabotages a data model, allowing for bad data. If there's only one DETAIL record associated to a CUSTOMER, then I'd expect a single record in the HEADER table - that's the point of a corrollary/xref/lookup table, to allow for 0+ supporting records. It also keeps queries consistent - none of this "what house is the moon in tonight?" fiasco leading to numerous queries...

OMG Ponies
A: 

"The correct answer from the database designer/data modeller would be that there are situations where a CUSTOMER record can be relate to a DETAIL record without a supporting HEADER record, per business rules."

It was very clearly indicated that both relationships are one to many, so a DETAIL record without a supporting HEADER record is impossible.

But what IS possible is that something in the DETAIL pertains to ANOTHER customer than the one found when following the HEADER/CUSTOMER "path". Allthough perhaps unlikely, only the ones who defined the schema can answer that.

Erwin Smout
A: 

Not knowing your table structure, I will point out that adding a customer id to multiple child tables is a common denormalization. From what you said, he made it a foreign key, so there is not risk to do so and a lot of potential performance benefit.

As long as there is also a foreign key to the header table, I see little issue with the design.

An experienced database person who knows the kind of queries that will be written against a table can see some of these denormalizations at design time. Likely if I have multiple customers, I will want to be able to query the detail table by customer without having to go through the header table on many occasions. This of course will depend on whether there is any information in the header table that I would want in the query. Here, we have the option to do either depending on which is the better choice for the particular query and all the constraints are in place to prevent data integrity issues. And all it cost us was a little extra disk space.

HLGEM
It's a foreign key only logically. All of the column values are NULL right now and there's no foreign key constraint check. It's for some future use. Also, it might be a common denormalization... i've seen it many a time, though not recently. But, just because it's common, is it ok to do?
Paul Sasik
No, but it is common becasue it is generally a performance enhancer to reduce the number of joins when you don't need the intervening tables. If the column is not yet populated, he can't put the foreign key on it, but if he is going to populate it then he can and then it will self maintain like any othe fk relationship. The real problem is when you do this without maintaining the relationships properly. As long as you do so it really isn't an issue to do this kind of denormalization.
HLGEM