tags:

views:

542

answers:

5

As a DBA for MS SQL 2000 and 2005, I regularly see giant select queries JOINing 7-10 or even more tables. I find, though, that there is a certain point past which performance tends to suffer, and the query becomes very difficult to debug and/or improve.

So is there a "rule of thumb" for when I should be considering other query methods, like temp tables to hold preliminary results? Or is there a point after which the SQL query optimizer just doesn't do a very good job of figuring out the best plan?

+4  A: 

A lot of times you can alleviate the visual smell by creating helper views, I do not think there is a hard and fast rule of how many joins are considered bad.

Unlike procedural coding, breaking down SQL into little bits and pieces can result in inefficient queries.

SQL Optimiser will work just fine with tons of table joins, and if you hit an corner case, you can specify the join order or style using hints. In reality I think it is very rare to get queries that join more than say 10 tables, but it is quite feasible that this could happen in a reporting type scenario.

If you discover a situation where you have lots of joins AND have discovered that this particular query is a bottleneck AND you have all the correct indexes in place, you probably need to refactor. However, keep in mind that the large amount of joins may only be a symptom, not the root cause of the issue. The standard practice for query optimisation should be followed (look at profiler, query plan, database structure, logic etc.)

SQL Server uses tempdb anyway for merge joins, so there is usually no need to create temp table just to refactor a single SELECT query.

Sam Saffron
I shy away from helper views, because they can frequently contain additional columns, filters, logic, or joins that might seem to make the view "nice" to use in a general sense, but may not be necessary for a specific query.I just refactored a complex query that was using a particularly bad view with a useless filter on an unindexed column. ("WHERE INACTIVE = 0" but none of the 16 million rows actually had this flag set)
BradC
True, like any bit of refactoring this technique can be misused. I think you should consider helper views if its going to help out multiple queries (not only for refactoring a single query)
Sam Saffron
+1  A: 

It's really depends on how big your tables are, even you only joining 2 tables together if it has 100M records, then that's gonna be a slow process anyway.

If you have X records in table a and Y records in table b, if you joining them together, you may get up to x*y records back, in that case the swap memory will be in use during in the process, that's gonna be slow, compare that, the small queries jonly use the CPU L2 cache which has the best performance.

However, if you feel really need to join a lot of tables to achieve the goal , I am suggesting your databases are over normalized, 3rd normalisation is working really well in most of scenario, don't try to spit the information too much , as it recognised to be inefficient for querying.

Yes, if necessary please create a table to cache the results from the heavy query, and updates the fields only when is necessary, or even only once a day.

Shuoling Liu
A: 

I also see mammoth queries joining 7-10 tables, but from what I've seen the query optimiser always seems to find the most efficient plan - certainly all the performance issues I see in these sorts of complex issues are usually related to some other problem (such as conditional WHERE statements or nested sub queries)

Kragen
A: 

The optimizer sets a time limit on itself to prevent it from running too long. The problem with many tables is that each one multiplies the number of possible plans for the optimizer to evaluate (actually it's the number of Joins, not tables per se). At some point the optimizer runs out of time and will just use the best plan that it has so far, which can be pretty bad.

So where is this point? Well, it's very situational, and 2005 is much better than 2000, but my usual rule of thumb is 4-8 for SQL Server 2000 and 6-16 for SQL Server 2005.

RBarryYoung
A: 

There are other variables involved that have a more significant impact on the overall query plan and performance, in my experience, such as:

  • the input row counts for each join operator
  • how efficiently the input data can be retrieved in the first place
  • the size and type of the columns being joined (eg type conversions, nullability)

You might have only two tables being joined together in a query, but if one key column is a GUID and the other is a varchar representation of a GUID, you have no indexes anywhere, and the tables are 2 million rows each, then you'll probably get very poor performance.

I've coded reporting queries with 10+ joins before, and the judicious use of a few non-clustered indexes on foreign key columns usually has the biggest benefit on the plan.

Sam
The SQL Server optimizer won't look at all of your joins if you have a lot, so if you know certain joins will help more than others in your query, there is an advantage to moving them up in the list.
Joe