views:

108

answers:

5

I recently read somewhere that one of ways of tuning sql query is - If it has too many joins then do one join with fewer tables and cache the results in a temporary table. Then do the rest of the query joining on that table.

My question - How it will improve the performance, as you are joining same numbers of tables. (Only not together) ?

Note :- I agree this is generic statement. I read it recently in some article. I think I will rephrase it. Under what condition storing result into temp table will help ?

Thanks Pravin

A: 

This would improve the performance if you run this query multiple times without rebuilding the temporary table. The first query would likely run around normal time, but subsequent queries would avoid the work of getting the data for the temporary table. However, the data in that table would become stale - updates from after it was built would not show up in later queries.

This might be acceptable in the case of queries which do not have to be absolutely up to date - for example statistical reporting queries can usually get away with day-old data.

As an alternative, many databases support materialized views (or indexed views), which are effectively temporary tables which are kept up to date automatically whenever you perform an update.

You could also reproduce this effect manually using triggers on update or by performing your updates via stored procedures. This approach results in a very fragile database and is generally error prone, so I would recommend against it generally.

GlennS
+1  A: 

This very much depends on your specific situation - such change could hurt or improve the performance. There is no general rule for this; what is the query that you are having trouble with?

It may improve the performance, as the result may be a smaller table, which is easier to query and join to; the query optimizer might do this automatically, but in some cases gets it wrong. This is a way of doing the optimizer's work manually.

Piskvor
+5  A: 

One of the reasons that you invest in a product like Oracle is for the development work they put in the optimizer piece of their engine. It has constantly improved over 20+ years, and in general, with proper statistics for your tables and indexes, it is hard to correctly outguess it for access to your data.

If I interpret your question as how would performance improve in queries of real-time data by building temporary tables each time the query executes, I would say that it wouldn't in most cases. In those other cases, instead of building a temporary table invest the time in structuring the query with Oracle's relatively new WITH clause which will handle materializing subsets of data dynamically in those cases where it makes sense to the optimizer.

If your question is about denormalizing data in a materialized view, data mart, or data warehouse fashion then yes this can dramatically improve query performance at the expense of access to the current state of the information (since the denormalized tables are always out of date). This improvement comes about in general because the RDBMS engine has less physical access work to do for the query because you have already done it once to build the denormalized structures.

dpbradley
Denormalized data is not necessarily always out of date - if denormalized data is maintained by triggers it will always be up to date - at the expense of the update performance.
Unreason
I think this 'truism' has come about as for a lot of developers it is quicker to take the temp table approach than develop an understanding of the DB optimiser.
JulesLt
@Unreason - yes you have a point. The code for a trigger to correctly handle inserts, updates, and deletes for source and destination tables in a concurrent access environment gets pretty complicated though.
dpbradley
'Cached' may be a better term than 'denormnalised'. However ANY separation of one query into two queries will lead to 'invalidating' read consistency unless you go for a SERIALIZABLE isolation level
Gary
A: 

I think this 'rule' has emerged because the behavior of the database engine becomes hard to predict when many tables are involved - each extra table multiplies the number of possible ways to execute the query.

In theory, it is possible to trace exactly how the Oracle optimizer makes it's decision, and use statistics, hints and plans to give it the information it requires to do it's job properly.

In reality, this process often seems to fall down at the developer/DBA gap - both in terms of training, and in terms of access to the required tools.

The disadvantage of the temp table approach is that you have prevented the database from using a 'better' optimisation when resources change (i.e. the DB server now has 8Gb of memory, so the fastest approach is to load all the tables entirely into memory, but the temp table approach has forced a write back to disk).

JulesLt
A: 

I would never, ever consider using temporary tables to improve the performance of a single query. (I assume you're talking about actual tables, not materialized views.) In my experience, Oracle can join several dozen tables without a problem at least 99.9% of the time. (If you have up to date statistics.)

For those rare cases when things don't seem optimal, you should first try working within the system Oracle provides you. Most performance problems I see are because someone isn't doing something in a logical way, or they are not aware of existing features. For example, using the same table twice instead of using analytics. If Oracle is still using a bad explain plan then you should look into using hints, or a trick like adding a ROWNUM to stop Oracle from re-writing certain subqueries.

If a temporary table will help, Oracle will do it all for you. Sometimes you can see objects like "SYS_TEMP..." in the explain plan.

jonearles