views:

1310

answers:

7

I am trying to stick to the practice of keeping the database normalized, but that leads to the need to run multiple join queries. Is there a performance degradation if many queries use joins vs having a call to a single table that might contain redundant data?

+10  A: 

Keep the Database normalised UNTIL you have discovered a bottleneck. Then only after careful profiling, denormalise.

In most instances, having a good covering set of indexes and up to date statistics will solve most performance and blocking issues without any denormalisation.

Using a single table could lead to worse performance, if there are writes as well as reads against it.

Mitch Wheat
A: 

With the proper indexes set up, your joins can perform very quickly. Use SQL Profiler to determine what indexes need to be created or altered to optimize performance of your common queries. Be sure to have a maintenance plan set up for your database to run once a week (or every day for tables that change a lot) that updates your statistics and indexes.

Normalization is normally preferred over keeping data in multiple locations. There are scenarios where insert/update does not need to occur quickly and select need to occur very quickly in which case you could be better off without normalization. Even so, premature optimization is not recommended so go with a normalized structure first.

DavGarcia
A: 

One of the ultimate hyperoptimizations available through some of the cloud sites is, in fact, using a smaller number of wider, limited-capability tables for efficiency. So far in the future if you need to wildly scale, this is one way. But it's not considered desirable practice for any relational dbms (which those aren't).

If you're having performance problems, there are a lot of things to work on first, before any kind of denormalizing.

le dorfier
+1  A: 

We leave query optimisation up to the database for the same reasons we leave code optimisation up to the compiler.

Most modern RDBMSes are pretty good in this respect these days.

Before you think that denormalisation is 'ok' in some cases, consider this: normally you are not interested in every attribute. Therefore loading unneeded data off the disk is inefficient (typically the least efficient component of the database). This can be much worse if you have a denormalised design, with lots of redundant data in a row. Even worse again if you have to then update all that redundant data. It can be much more efficient to load some narrow tables containing only the columns of interest and join them. Again, this depends on the database, so without profiling you have no clue.

If you are really worried about performance, you're probably talking scalability issues. In this case you might want to look at sharding, for which proper (normalised) schema design is important.

+6  A: 

Michael Jackson (not that one) is famously believed to have said,

  • The First Rule of Program Optimization: Don't do it.
  • The Second Rule of Program Optimization – For experts only: Don't do it yet.

That was probably before RDBMSs were around, but I think he'd have extended the Rules to include them.

Multi-table SELECTs are almost always needed with a normalised data model; as is often the case with this kind of question, the "correct" answer to the "denormalise?" question depends on several factors.

DBMS platform.

The relative performance of multi- vs single-table queries is influenced by the platform on which your application lives: the level of sophistication of the query optimisers can vary. MySQL, for example, in my experience, is screamingly fast on single-table queries but doesn't optimise queries with multiple joins so well. This isn't a real issue with smaller tables (less than 10K rows, say) but really hurts with large (10M+) ones.

Data volume

Unless you're looking at tables in the 100K+ row region, there pretty much shouldn't be a problem. If you're looking at table sizes in the hundreds of rows, I wouldn't even bother thinking about indexing.

(De-)normalisation

The whole point of normalisation is to minimise duplication, to try to ensure that any field value that must be updated need only be changed in one place. Denormalisation breaks that, which isn't much of a problem if updates to the duplicated data are rare (ideally they should never occur). So think very carefully before duplicating anything but the most static data, Note that your database may grow significantly

Requirements/Constraints

What performance requirements are you trying to meet? Do you have fixed hardware or a budget? Sometimes a performance boost can be most easily - and even most cheaply - achieved by a hardware upgrade. What transaction volumes are you expecting? A small-business accounting system has a very different profile to, say, Twitter.

One last thought strikes me: if you denormalise enough, how is your database different from a flat file? SQL is superb for flexible data and multi-dimensional retieval, but it can be an order of magnitude (at least) slower than a straight sequential or fairly simply indexed file.

Mike Woodhouse
thanks for the response
zsharp
+1  A: 

There is a cost to decomposing tables for the sake of normalization. There is a performance component to that cost. The performance cost of decomposing tables and joining data in queries can be kept low by: using a good DBMS; designing tables right; designing indexes right; letting the optimizer do its job; and tuning the DBMS specific features of physical design.

There is also a cost to composing large tables that materialize joins. The cost in terms of update anomalies and programming difficulties is outlined in good tutorials on normalization. There is also a performance cost to composing tables. In many DBMS products, loading a very big row into memory costs more than loading a smaller row. When you compose very wide tables, you end up forcing the DBMS to read very big rows, only to discard most of the data read into memory. This can slow you down even more than normalization does.

In general, don't denormalize at random. When necessary, use a design discipline that has been tested by people who went before you, even if that discipline results in some denormalization. I recommend star schema as such a discipline. It has a lot going for it. And there are still plenty of situations where a normalized design works better than a star schema design.

Learning more than one set of design principles and learning when to use which set is the second stage of learning to be an expert.

Walter Mitty
A: 

Performance difference?

Sanity difference.

Justice