views:

151

answers:

5

I've been reading a lot about Relational Databases using many JOIN statements on every SELECT. However, I've been wondering if there's any performance problem on the long run when abusing this method.

For example, lets say we have a users table. I would usually add the "most used" data, instead of doing any extra JOINs. When I say the "most used" data, for instance, would be the username, display picture and location.

This data would always be needed when displaying any user interaction on the website, example: on every comments table JOIN for articles. Instead of doing a JOIN on the users & users_profiles tables to get the 'location' and 'display', just use the information on users table.

That's my approach, however I do know that there are a lot of excellent and experienced programmers that can give me a word of advice about this matter.

My questions are:

Should I try to be conservative with the JOINs? or should I use them more? Why?

Are there any performance problems on the long run when using JOIN a lot?

Note: I must clarify, that I'm not trying to avoid JOINS at all. I use them only when needed. On this example would be comment/article authors, extra profile information that only displays on user profiles pages... etc.

A: 

If the data is 1 <-> 1, and you will not have many null fields, dont over normalize. You can still specify the fields required ("most used data") in the select statements.

astander
+4  A: 

Some bright person once said:

Normalize until it hurts, denormalize until it works!

It all depends on the type of joins, and the join conditions, but there are nothing wrong with them. Joins ON table1.PK = table2.FK are very efficient.

Paul Creasey
+4  A: 

My advice on data modeling is:

  • You should favour optional (nullable) columns over 1:1 joins generally speaking. There are still instances where 1:1 makes sense, usually revolving around subtyping. People tend to be more squeamish when it comes to nullable columns than they do about joins oddly;
  • Don't make a model too indirect unless really justified (more on this below);
  • Favour joins over aggregation. This can vary so it needs to be tested. See Oracle vs MySQL vs SQL Server: Aggregation vs Joins for an example of this;
  • Joins are better than N+1 selects. An N+1 select is, for example, selecting an order from a database table and then issuing a separate query to get all the line items for that order;
  • The scalability of joins is usually only an issue when you're doing mass selects. If you select a single row and then join that to a few things rarely is this a problem (but sometimes it is);
  • Foreign keys should always be indexed unless you're dealing with a trivially small table;

More in Database Development Mistakes Made by AppDevelopers.

Now as for directness of a model, let me give you an example. Let's say you're designing a system for authentication and authorization of users. An overengineered solution might look something like this:

  • Alias (id, username, user_id);
  • User (id, ...);
  • Email (id, user_id, email address);
  • Login (id, user_id, ...)
  • Login Roles (id, login_id, role_id);
  • Role (id, name);
  • Role Privilege (id, role_id, privilege_id);
  • Privilege (id, name).

So you need 6 joins to get from the username entered to the actual privileges. Sure there might be an actual requirement for this but more often than not this kind of system is put in because of the hand-wringing by some developer thinking they might someday need it even though every user only has one alias, user to login is 1:1 and so on. A simpler solution is:

  • User (id, username, email address, user type)

and, well, that's it. Perhaps if you need a complex role system but it's also quite possible that you don't and if you do it's reasonably easy to slot in (user type becomes a foreign key into a user types or roles table) or it's generally straightforward to map the old to the new.

This is thing about complexity: it's easy to add and hard to remove. Usually it's a constant vigil against unintended complexity, which is bad enough without going and making it worse by adding unnecessary complexity.

cletus
Great comment about complexity: "it's easy to add and hard to remove"
carl
Thank you very much for this excellent answer. It answers all my doubts about this matter. Thanks.
kuroir
A: 

Fear not joining. The relational model is strong and you should employ it. Someone always discussed N+1, but also consider--in your context--joining against users often for security purposes too as the query can additionally mandate user existence, status, session correctness, and field expectation.

Many large sites go so far as to have session table and http request table for every request, always joined against each other for the page queries. Benefit is that parameters are always matched to sessions, sessions to proper users, user status always checked, &c &c but moreso that it allows for some interesting scale-out benefits.

Long story, do it wisely, but don't skimp on joining.

Xepoch
A: 

As others have said - joins aren't a thing to avoid at all. In fact, in most models it is rare not to have a few joins in every single query that the application runs.

Even in the biggest queries they aren't usually a performance problems - and often fix performance problems that would occur if you have redundant and repeating data all over the place.

However, be aware that under the cover the database just joins two tables at a time. So, joins necessitate multiple steps by the database that are invisible to the developer. When it does these joins it has to make a few decisions about how to go about it:

  • walk through all the values on the left table, then match them one at a time to the values on the right?
  • Do just the opposite?
  • Sort the keys from both tables and walk through them at the same time?
  • Build hashes of keys on both sides?
  • Apply filtering criteria before or after a given join?
  • etc

So, if your joins are complex ultimately the efficiency will be driven by the sophistication of your optimizer/planner and the currency and detail of your statistics. MySQL isn't a strong contender here - so I'd generally keep my model and sql a little simpler than if I was using something else. But a few joins per query should almost always be fine.

KenFar