views:

294

answers:

5

I am designing a database and I would like to normalize the database. In one query I will joining about 30-40 tables. Will this hurt the website performance if it ever becomes extremely popular? This will be the main query and it will be getting called 50% of the time. The other queries I will be joining about two tables.

I have a choice right now to normalize or not to normalize but if the normalization becomes a problem in the future I may have to rewrite 40% of the software and it may take me a long time. Does normalization really hurt in this case? Should I denormalize now while I have the time?

+4  A: 

I quote: "normalize for correctness, denormalize for speed - and only when necessary"

I refer you to: http://stackoverflow.com/questions/293425/in-terms-of-databases-is-normalize-for-correctness-denormalize-for-performance

HTH.

Sunny
+1. You don't normalize a database - _always_ start with 3NF. Revert to lower levels for speed if, _and only if_, it becomes necessary. And make sure you understand the consequences and solutions. There are ways to mitigate problems caused by denormalisation (triggers, computed columns and so forth). Also look up YAGNI :-)
paxdiablo
So do you think 30-40 tables will not be a problem joining? Also, if normalization does become a problem is it possible to add better hardware to offset the normalization costs?
Luke101
@Luke: no, it may well be a problem joining 40 tables at which point you should consider denormalising (but only after the problem appears, not in anticipation of a problem which may not exist - measure, don't guess). But I'd be _very_ interested in a 3NF schema that required a join of that many tables. In my experience, I've never come across a situation that extreme. Perhaps if you added more detail on that aspect, we could both understand better and offer more targeted advice.
paxdiablo
Sunny
A: 

Don't make early optimizations. Denormalization isn't the only way to speed up a website. Your caching strategy is also quite important and if that query of 30-40 tables is of fairly static data, caching the results may prove to be a better optimization.

Also, take into account the number of writes to the number of reads. If you are doing approximately 10 reads for every insert or update, you could say that data is fairly static, hence you should cache it for some period of time.

If you end up denormalizing your schema, your writes will also become more expensive and potentially slow things down as well.

Really analyze your problem before making too many optimizations and also wait to see where your bottlenecks in the system really as you might end up being surprised as to what it is you should optimize in the first place.

jamesaharvey
the 30-40 tables will not be static at all. On a normal day we expect about a 1000 updates and inserts.
Luke101
Doing 1000 updates in a day is less than 1 per minute. I'd call that fairly static.
Gabe
Agreed. And assuming you're doing more reads than writes, you're caching strategy is going to prove to be very important.
jamesaharvey
+3  A: 

When performance is a concern, there are usually better alternatives than denormalization:

  • Creating appropriate indexes and statistics on the involved tables
  • Caching
  • Materialized views (Indexed views in MS SQL Server)
  • Having a denormalized copy of your tables (used exclusively for the queries that need them), in addition to the normalized tables that are used in most cases (requires writing synchronization code, that could run either as a trigger or a scheduled job depending on the data accuracy you need)
ckarras
+1  A: 

Normalization can hurt performance. However this is no reason to denormalize prematurely.

Start with full normalization and then you'll see if you have any performance problems. At the rate you are describing (1000 updates/inserts per day) I don't think you'll run into problems unless the tables are huge.

And even if there are tons of database optimization options (Indexes, Prepared stored procedures, materialized views, ...) that you can use.

Foxfire
+1  A: 

Maybe I missing something here. But if your architecture requires you to join 30 to 40 tables in a single query, ad that query is the main use of your site then you have larger problems.

I agree with others, don't prematurely optimize your site. However, you should optimize your architecture to account for you main use case. a 40 table join for a query run over 50% of the time is not optimized IMO.

Foovanadil