MySQL: duplicating data vs. join

views:

241

answers:

+1 Q:

MySQL: duplicating data vs. join

Assuming I have two tables: source, and article and I wish to read article with specific details its source, I can either (1) use join for the two tables; or (2) duplicate the details to the article-record (which will make the data-unit larger, but the query will be very simple). which would be more efficient?

+2 A:

Depends, do you want duplicate data in your database? Then when you need to update something you would have to update it in multiple locations. Sometimes its ok to have a little duplicate data, but to avoid joins all together would probably negatively effect you.

Chad Scira 2009-05-09 06:04:50

For the sake of argument, fast reading is my top priority so I'm willing to risk a bit more difficult write operations. (not to mention that most of my data doesn't suppose to change once written)

Oded 2009-05-09 06:09:39

+2 A:

Depends on the data. Let's say you have huge table of articles and small table of authors. If you wanted to do a lot of queries that would get some article data and name of the author (which is in article table by default), then you'd have a simple primary key lookup for every "author" row, and the small table would probably fit in the memory, so there will not be a huge performance boost from including author's name in article table. Also, this denormalization will also make "articles" table a bit larger (every author's name will be duplicated many times), so it'll use up more of your cache.

On the other hand, if you wanted to query number of articles for every author, getting this data from two tables would mean aggregating a lot of rows every time. But if you'd include this number in "authors" table, getting it would mean just a single lookup, and an increment for every added article. So if you'd be interested in this kind of results, denormalization might make sense.

che 2009-05-09 06:40:39

+2 A:

which would be more efficient?

Simply said (perhaps too simple): You're trading memory for cpu cycles - which may lead to worse cacheability and take down performance.

The only way to answer your question correctly is to take your environment and measure performance. Make sure to include "correctly" indexed tables. Create a realistic load to the database - e.g. make sure that you don't hit the cache for the same rows over and over again.

Ask yourself upfront if from what performance gain (1%, 10%, 100%) it's worth to start denormalizing.

Olaf 2009-05-09 06:49:54

+1 A:

This is a design decision which means that without all the details of you analysis (goals, constraints, user requirements, etc) but a couple of rules of thumb I use;

1/ A join between 2 tables in not generally very expensive and is an easy case to tune (for example you say there will be little updating, and I presume not extensive insert/delete, and mostly selects therefore this is likely to be a situation that indexing will speed up)

2/ When designing a schema first normalise it to the highest degree possible/sensible and then later when real world scenarios prove it is worthwhile, denormalise. (And generally deciding to normalise and then denormalise specific items works fairly well, failing to normalise does not generally deliver a good result.

3/ Over a period of time normalisation pays for itself (in later years when you try to make some change to a system, a well designed foundation is truly welcome and praised)

4/ Denormalising seems to me to best suit reporting situations where adhoc queries are going to be used. Or in other words the main reason I see to denormalise is make life easier for report writers who have a high query-write/use ration

Karl 2009-05-09 07:01:37

duplicating data might bring you more performance. Notice I wrote might because you are going to have caching problems. On the other hand when duplicating data you are making your system more difficult to be maintained (BTW, you are violating a DB normal form). If the price you have to pay is only one table join then just pay it. Be sure you have indexex on the columns you are joining and then the price is not at all going to be that expensive.

Bottom line: Never duplicate data unless is critical.

Luixv 2009-05-09 08:23:34

If read performance is a priority, you could use Materialized Views. As MySQL doesn't support them(I think), you can simulate them.

This solution lets you keep the original database normalized, but you gain performance given by simple queries from MVs.

macbirdie 2009-05-09 11:16:19

ansaurus

tags:

views:

answers:

MySQL: duplicating data vs. join

related questions