views:

120

answers:

6

Hi!

We're drawing up the database structure with the help of mySQL Workbench for a new app and the number of joins required to make a listing of the data is increasing drastically as the many-to-many relationships increases.

The application will be quite read-heavy and have a couple of hundred thousand rows per table.

The questions:

  • Is it really that bad to merge tables where needed and thereby reducing joins?

  • Should we start looking at horizontal partitioning? (in conjunction with merging tables)

  • Is there a better way then pivot tables to take care of many-to-many relationships?

  • We discussed about instead storing all data in serialized text columns and having the application make the sorting instead of the database, but this seems like a very bad idea, even though that the database will be heavily cached. What do you think?

Thanks!

+1  A: 

As always, it depends on your application, but in general, too much denormalisation can come back and bite you later on. A well normalised database means that you should be able to query your data in most ways that you may need later on, particularly for reporting (which often is an afterthought).

If you stick all your data in serialized text columns and your client asks for a report showing all rows that have a particular attribute, then you're going to have to do a bunch of string manipulation to get this data out.

If you're worried about too many joins for your queries, you could consider exposing certain sets of the data as a view...

Paddy
+2  A: 

In inverse order:

  • Forget it. Use the database. People saynig "make it in the application" are pretty often those ignorant to the amount of work going into writing databases.

  • Depends on exact need.

  • Depends on exact need. OLTP (Transaction processing) - go for for firth normal form. OLAP (Analytical processing) - go for a proper star diagram and denormalize to get optimal performance. Mixed - forget it. Does not work for larger installs because the theories are different... except if you make the database OLTP and then use a special OLAP cube database (which mySQL does not have).

TomTom
+4  A: 

Go with the normalized form of the database. For most part of the tasks you won't need more than 3 or 4 Joins and you still can write views for the most common joins. Denormalization will have you to always think of updating fields in multiple places/tables when changing one property and will surely lead to more problems than benefits.

If you worry about reporting performance then you still can extract the data in timed batches into separate tables to get the desired performance for your reporting queries. If it's for query simplicity you can use views.

MicSim
Thanks! We're actually in the range of 8-9 joins already and there will most certainly be additions to the core, so 10-15 joins might not be totally impossible in the future..
Industrial
start out normalized. define your requirements for performance. measure. if you aren't performing, strategically de-normalize until you meet your requirements.
matt eisenberg
+2  A: 

Databases are designed to handle lots of joins. Use this feature as it will make many kinds of data manipulation in the database much easier. Otherwise, why not just use a flat file?

jle
I have always been told to normalize, normalize and normalize again. Then the number of joins increases which I've been told kills performance in no time.
Industrial
Too many joins will kill the performance when the dataset grows. There are obviously numerous other advantages to databases than the ability to do joins..
Kimble
@Kimble : Of course depending on the application, but where do you draw the line on having "too many joins"?
Industrial
@Industrial: I don't think there is an exact answer to that question. It depends a lot on the amount of data in the tables included in the join. Digg.com has published a lot of articles / blog entries regarding their experiences in this area.
Kimble
@Kimble Thanks! Will check their blog out for more info. Let me know if you had any specific article in mind!
Industrial
A: 

Unless you have clear evidence that performance is suffering because of the joins, stay normalised. Otherwise, as others have said, you'll have to worry about multiple updates.

Especially if the database is heavily cached, as you say, you'll be surprised how quick the DBMS is at doing this kind of thing - it is what it's designed for, after all.

Unless it's the sort of monster application, with huge amounts of data, that demands special performance optimisations, you'll find that keeping down the development, testing, and later, maintenance effort, will be much more important.

Joins are good, usually, not bad. They allow you to keep the data where it should be, which gives you maximum flexibility.

And as has been said many times, premature optimisation is usually bad, not good.

ChrisA
Well, we're not building the new Facetube, but we really don't want to start out building an application that will suffer from bad structure from day one. What is your opinion on when to merge tables and separate them from eachother?
Industrial
@Industrial - The time to denormalise is usually when you have clear evidence of the need to. Other than that, the 'bad structure from day one' is more likely to arise from attempting to guess where you're better than the DBMS at optimising.
ChrisA
+1  A: 

If you make sure to index the foreign keys (you did set up foreign keys didn't you?) and have proper where clauses in your queries, 10-15 joins should be easily handled by a database. Especially with so few rows. I have queries with that many joins on tables with millions of rows and they run fine.

Usually it is better to partition data than to denormalize.

As far as denomalizing goes, don't do it unless you also institute a strategy for keeping the denormalized data in synch with the parent table.

As to whether you really need that many tables or if your design is bad, well the only way we could comment on that is if we saw the table structure.

HLGEM