views:

306

answers:

5

Every time a database diagram gets looked out, one area people are critical of is inner joins. They look at them hard and has questions to see if an inner join really needs to be there.

Simple Library Example:

A many-to-many relationship is normally defined in SQL with three tables: Book, Category, BookCategory.

In this situation, Category is a table that contains two columns: ID, CategoryName.

In this situation, I have gotten questions about the Category table, is it need? Can it be used as a lookup table, and in the BookCategory table store the CategoryName instead of the CategoryID to stop from having to do an additional INNER JOIN. (For this question, we are going to ignore the changing, deleting of any CategoryNames)

The question is, what is so bad about inner joins? At what point is doing them a negative thing (general guidelines like # of transactions, # of records, # of joins in a statement, etc)?

A: 

I am not sure there is some thing wrong in inner join per se, it is like each IF you add to your code impacts performance (or should I say every line...), but still, you need a minimum number of those to make your system work (yes yes, I know about Turing machines). So if you have something that is not needed, it will be frowned upon.

Itay Moav
joins actually improve performance (most of the times) ... have a look at http://www.sql-server-performance.com/tips/tuning_joins_p1.aspx
Gaby
@Gaby Splitting a relation into two and joining them together again when retrieving data will (almost) certainly make the query slower. Though it will make some updates faster. Your linked pages gives hints how to improve the performance of a join but does not at all compare models with and without joins.
Daniel Brückner
@Daniel, when you are selecting everything i agree that a single table would be the fastest, but when you start introducing filters i believe that joined tables will perform faster since they filtering will happen by comparing numbers.. This is just my understanding, and i am no expert on this..
Gaby
Filtering will still compare the actual data using existing indexes - there is no difference between the cases with and without joins. But filtering is still a valid point - a table with 10,000 rows can be normalized into two tables with only 100 rows each and this will probably speed up filtering. But this is more an effect of normalizing the model instead of using joins.
Daniel Brückner
+5  A: 

Your example is a good counterexample. How do you rename categories if they're spread throughout the various rows of the BookCategory table? Your UPDATE to do the rename would touch all the rows in the same category.

With the separate table, you only have to update one row. There is no duplicate information.

John Saunders
+1 absolutely - sum it up nicely.
marc_s
Right, I understand that updating one row vs many in one table would be trivial, I wanted to keep that type of discussion out of this, wanted to focus the INNER JOIN.
Stephen B. Burris Jr.
+3  A: 

I would be more concerned about OUTER joins, and the potential to pick up info that wasn't intended.

In your example, having the Category table means that a book is limited to being filed under a preset Category (via a foriegn key relationship), if you just shoved multiple entries in to the BookCategory table then it would be harder to limit what is selected for the Category.

Doing an INNER join is not so bad, it is what databases are made for. The only time it is bad is when you are doing it on a table or column that is inadequately indexed.

slugster
A: 

When you map your domain model onto the relational model you have to split the information across multiple relations in order to get a normalized model - there is no way around that. And then you have to use joins to combine the relations again and get your information back. The only bad thing about this is that joins are relative expensive.

The other option would be not to normalize your relational model. This will fill your database with much redundant data, give you many opportunities to turn your data inconsistent and make updates a nightmare.

The only reason not to normalize a relational model (I can think of at the moment) is that reading performance is extremely - and I mean extremely - critical.

By the way, why do you (they) only mention inner joins? How are left, right, and full outer joins significantly different from inner joins?

Daniel Brückner
A: 

Nobody can offer much about general guidelines - they'd be specific to the server, hardware, database design, and expectations... way too many variables.

Specifically about INNER JOINs being inefficient or bad... JOINs are the center of relational DBs, and they've been around for decades. It's only wrong when you use it wrong, because obviously someone's doing it right since it's not extinct yet. Personally, I'd assume anyone throwing out blanket statements like that either don't know SQL or know just enough to get in trouble. Next time it comes up, teach them how to use the query cache.

(Not mentioning update/delete, but you didn't say inserts!: the increased maintainability through avoiding humans and their typos can easily be worth at least 10x the time a join will take.)

tadamson