views:

351

answers:

5

Suddenly (but unfortunately I don't know when "suddenly" was; I know it ran fine at some point in the past) one of my queries started taking 7+ seconds instead of milliseconds to execute. I have 1 local table and 3 tables being accessed via a DB link. The 3 remote tables are joined together, and one of them is joined with my local table.

The local table's where clause only takes a few millis to execute on its own, and only returns a few (10's or 100's at the most) records. The 3 remote tables have many hundreds of thousands, possibly millions, of records between them, and if I join them appropriately I get tens or hundreds of thousands of records.

I am only joining with the remote tables so that I can pull out a few pieces of data related to each record in my local table.

What appears to be happening, however, is that Oracle joins the remote tables together first and then my local table to that mess at the end. This is always going to be a bad idea, especially given the data set that exists right now, so I added a /*+ LEADING(local_tab remote_tab_1) */ hint to my query and it now returns in milliseconds.

I compared the explain plans and they are almost identical, save for a single BUFFER SORT on one of the remote tables.

I'm wondering what might cause Oracle to approach this the wrong way? Is it an index issue? What should I be looking for?

+3  A: 

When choosing an execution plan, oracle estimates costs for the different plans. One crucial information for that estimate is the amount of rows will get returned from a step of the execution plan. Oracle tries to estimate those using 'statistics', i.e. information about how many rows a table contains, how many different values a column contains; How evenly these values are distributed.

These statistics are just that statistics, and they might be wrong, which is one of the most important reasons for misjudgments of the oracle optimizer.

So gathering new statistics as described in a comment might help. Have a look at the documentation on that dbms_stats package. There are many different ways to call that package.

Jens Schauder
+1  A: 

A common problem I've come across is a query that joins many tables, where the joins form a chain from one end to another, e.g.:

SELECT *
FROM   tableA, tableB, tableC, tableD, tableE
WHERE  tableA.ID0 = :bind1
AND    tableA.ID1 = tableB.ID1
AND    tableB.ID2 = tableC.ID2
AND    tableC.ID3 = tableD.ID3
AND    tableD.ID4 = tableE.ID4
AND    tableE.ID5 = :bind2;

Notice how the optimiser might choose to drive the query from tableA (e.g. if the index on ID0 is nicely selective) or from tableE (if the index on tableE.ID5 is more selective).

The statistics on the tables might cause the choice between these two plans to balance on a knife-edge; one day it's working fine (driving from tableA), next day new stats are gathered and all of a sudden the alternative plan driving from tableE has a lower cost and is chosen.

In this circumstance, adding a LEADING hint is one way to nudge it back to the original plan (i.e. drive from tableA) without dictating too much to the optimiser (i.e. it doesn't force the optimiser to choose any particular join methods).

Jeffrey Kemp
+1  A: 

You're doing distributed query optimization, and that's a tricky beast. It could be that the your table's statistics are current, but now the tables at the remote system are out-of-whack or have changed. Or the remote system added/removed/modified indexes, and that broke your plan. (This is an excellent reason to consider replication -- so you can control indexes and statistics against it.)

That said, Oracle's estimate of cardinality is a primary driver in execution plan. A 10053 trace analysis (Jonathan Lewis' Cost-Based Oracle Fundamentals book has wonderful examples from 8i to 10.1) can help shed light on why your statement's now broken and how the LEADING hint fixes it.

The DRIVING_SITE hint might be a better choice if you know you always want the local tables to be joined first before going after the remote site; it clarifies your intention without driving the plan the way a LEADING hint would.

Adam Musch
+1  A: 

Might not be relevant but I had a similar situation once where the remote table had been replaced by a single-table view. When it was a table the distributed query optimizer 'saw' that it had an index. When it became a view it couldn't see the index anymore and couldn't cost a plan that used an index on the remote object.

That was a few years ago. I documented my analysis at the time here.

Gary
+1  A: 

RI,

It's hard to be sure about the cause of the performance problems without seeing the SQL.

When an Oracle query was performing well before, and suddenly starts performing badly, it is usually related to one of two issues:

A) Statistics are out of date. This is the easiest and quickest thing to check, even if you have a housekeeping batch process that's supposed to take care of it ... always double-check.

B) Data volume / data pattern change.

In your case, running a distributed query across multiple databases makes it 10x harder for Oracle to manage performance between them. Is it possible to put these tables in one database, perhaps separate schema owners in one database?

Hints are notoriously fragile, as Oracle is under no obligations to follow the hint. When the data volume or pattern changes some more, Oracle may just ignore the hint and do what it thinks is best (ie. worst ;-).

If you cannot put these tables all in one database, then I recommend you look to break your query up into two statements:

  1. INSERT on sub-SELECT to copy external data to a global temporary table in your current database.
  2. SELECT from the global temporary table to join with your other table.

You will have complete control over performance of step 1 above without resorting to hints. This approach typically scales well, providing you take time to do the performance tuning. I've seen this approach solve many complex performance problems.

The overhead for Oracle to create a whole new table, or insert a heap of records, is much smaller than most people expect. Defining a global temporary table further reduces that overhead.

Matthew

Matthew Eyles