ansaurus

Question

How to optimize oracle query for repeated full table sorts?

Answer 1

A:

Hello,

Some things that would help the sorting issue is to have indexes on the columns that you are sorting on (and also joining the tables on, if they're not there already). You could also create materialized views which are already sorted, and Oracle would keep the sorted results cached.

Jon 2009-07-07 23:50:02

You could do a clustered index on a single table, which would put it in sorted order. It has advantages and disadvantages. Asktom.oracle.tom has great advice about that. But I don't think that would have much effect when you're joining a bunch of tables together.

Jon 2009-07-07 23:53:13

Index Organised Tables are 'ordered' (or more correctly structured) by the primary key. In an indexed cluster, Oracle Database stores together rows having the same cluster key value. They are not necessarily stored in a sorted order. That is, if clustered on dept id, all employees in DEPT 10 would be together, and all employees in DEPT 20 together, but DEPT 20 may come out before DEPT 10 in a full table scan.

Gary 2009-07-08 00:56:04

That might be workable on the intermediate tables, but beyond just "Adding an index", unlikely to be able to get anything done to the source tables. (A peoplesoft installation is the primary data source.) This is a straight 1Table:1Table sync, no joins involved.

Nathan Neulinger 2009-07-08 01:26:18

Answer 2

A:

You don't say exactly how the replication is done or the data volumes involved (or why you are sorting the data).

If the aim is to minimise the impact on the source database, your best bet may be to extract into an intermediate file and load the file into the destination databases. The sort could be done on the intermediate file (if plain text), or as part of either the export or import into the destination databases.

In source database :
create table export_emp_info
 organization external
 ( type oracle_datapump
   default directory DATA_PUMP_DIR
   location ('emp.dmp')
) as select emp_id, emp_name, dept_id from emp order by dept_id
/


Copy file then, import in dest database:
create table import_emp_info
(EMP_ID                    NUMBER(12),
EMP_NAME                   VARCHAR2(100),
DEPT_ID                    NUMBER)
 organization external
 ( type oracle_datapump
   default directory DATA_PUMP_DIR
   location ('emp.dmp')
)
/
insert into emp_info select * from import_emp_info;

If you don't want or can't have the external table on the source db, you can use a straight expdp of the emp table (possibly using NETWORK_LINK if you have limited access to the source database directory structure) and QUERY to do the ordering.

Gary 2009-07-08 00:46:02

Basically, I'm pulling the rows sequentially from both sides (source and destination) and doing a "merge sort style sync", comparing each row in turn from either source or destination to know whether I need to insert the new row from the source or remove the row from the destination. It guarantees that I can run the process with any state of source/dest table and end up with them identical or same as they started in the case of a rollback. I hadn't considered the 'external file + sort' approach, will think about that.

Nathan Neulinger 2009-07-08 01:24:54

Answer 3

+2 A:

Hi Nathan,

you could look into the multi-table INSERT feature. It should perform a single FULL SCAN and will insert into multiple tables. Consider (10gR2):

SQL> CREATE TABLE t1 (ID NUMBER);

Table created
SQL> CREATE TABLE t2 (ID NUMBER);

Table created

SQL> INSERT ALL
  2     INTO t1 VALUES (d_id)
  3     INTO t2 VALUES (d_id)
  4  /* your select goes here */
  5  SELECT ROWNUM d_id FROM dual d CONNECT BY LEVEL <= 5;

10 rows inserted
SQL> SELECT COUNT(*) FROM t1;

  COUNT(*)
----------
         5
SQL> SELECT COUNT(*) FROM t2;

  COUNT(*)
----------
         5

You will have to check if it works over database links.

Vincent Malgrat 2009-07-08 07:00:25

+1 Good suggestion. And I've implemented production code using this construct over database links (10.2.0.4).

Rob van Wijk 2009-07-08 11:44:14

Unfortunately, this does what the code was written to avoid... Works great if you have a small table, but if you have a table with 1.5M rows in it, the code above basically forces you to empty the table first, which either means a huge amount of redo, or a table that isn't safe to use during the reload period and can potentially be completely unavailable. (Basically, you've avoided the sort by trading it for inserting all data every time.)

Nathan Neulinger 2009-07-08 12:20:46

Answer 4

A:

You could load data from source table A to an intermediate table B and then do a partition exchange between B and destination table C. Exact replication, no sorting involved.

jva 2009-07-08 14:15:21

Answer 5

A:

This I/U/D form of replication is what the MERGE command is there for. It's very doubtful that an expensive sort-merge would be required, and I'd expect to see hash joins instead. As long as the hash table can be stored in memory the hash join is barely more expensive than scanning the tables.

A handy optimisation is to store a hash value based on the non-key attributes, so that you can join between source and target tables on the key column(s) and compare small hash values instead of the full set of columns - change detection made easy.

David Aldridge 2009-07-08 22:55:29

ansaurus

tags:

views:

answers:

How to optimize oracle query for repeated full table sorts?

related questions