ansaurus

Question

DB Table Optimization join vs repeat columns

Answer 1

+5 A:

It would depend on how slow the first is not the complexity of the join. It would be an extremely poor idea to do this solely becasue you don't want to write (one time) a more complex query. Performance is the only real reason to do something of this nature.

If the first is unacceptably slow, then a table or field summing the points can be an acceptable denormalization if and ONLY if you keep the field updated through a trigger not from the application (the only way to ensure accuracy of the denormalized number). You would need to test the solution including the extra update time to determine if you have indeed saved any processing time. This may depend on how often the numbers are changed. FOr instance if you add a second to the update time and save ten seconds on the select, but you 10,000 updates for every selct this is not a good optimization. However if you make a report go from an hour to millseconds and only add a millisecond to the insert or update, it might be acceptable.

There is no way to answer this without actually coding and testing both solutions with production level workload and data.

HLGEM 2010-09-27 18:11:26

Ok that makes sense, it isn't in deployment yet, but was wondering what best design would be to start with. I will go with the First option and go for integrity. If I see a problem later I can always go for the second option.

Scottix 2010-09-27 20:09:04

Answer 2

+1 A:

If the query performs reasonably well, I would leave it as is. An ugly, well-performing query beats redundancy in my book.

With the redundancy option, you need to make sure you encapsulate your update statements in a transaction to make sure everything gets updated; otherwise, you run the risk of having your data out of sync.

I've worked with some legacy apps that went the Redundancy route without transactions, and when one table doesn't get updated for whatever reason, it gets messy.

LittleBobbyTables 2010-09-27 18:12:13

Answer 3

+2 A:

It depends on many factors, most of which depend on your setup.

The two most important factors are:

How often you are running the query. Keep in mind that the second solution not only uses more disk space (which could theoretically decrease performance) but also requires you to take care of the denormalized structure when adding records. Although that could be automated using a trigger (depending on the RDBMS), it is still a performance overhead.
The RDBMS you are using. Your first query may be ugly (I've seen much worse still), but are you sure it is slow? The only way to get a definitive answer to that question is to run the query and check with EXPLAIN [query] what query plan is used by your RDBMS.

So basically, I would stick to the first solution. Not having a normalized relation scheme is a good thing sometimes, but you should one denormalize your structure, if you are sure, it will give you a performance boost and if you've identified the bottleneck in your application in a production-like environment.

ChrisM 2010-09-27 18:16:54

Thx Chris that helps a lot.

Scottix 2010-09-27 20:09:45

ansaurus

tags:

views:

answers:

DB Table Optimization join vs repeat columns

related questions