views:

118

answers:

2

Hi everyone,

Im looking for the best solution (performance wise) to achieve this.

I have to insert records into a table, avoiding duplicates.

For example, take table A

Insert into A (
 Select DISTINCT [FIELDS] from B,C,D.. 
 WHERE (JOIN CONDITIONS ON B,C,D..)
 AND 
 NOT EXISTS
 ( 
   SELECT * FROM A ATMP WHERE
   ATMP.SOMEKEY = A.SOMEKEY
 )
);

I have an index over A.SOMEKEY, just to optimize the NOT EXISTS query, but i realize that inserting on an indexed table will be a performance hit.

So I was thinking of duplicating Table A in a Global Temporary Table, where I would keep the index. Then, removing the index from Table A and executing the query, but modified

Insert into A (
 Select DISTINCT [FIELDS] from B,C,D.. 
 WHERE (JOIN CONDITIONS ON B,C,D..)
 AND 
 NOT EXISTS
 ( 
   SELECT * FROM GLOBAL_TEMPORARY_TABLE_A ATMP WHERE
   ATMP.SOMEKEY = A.SOMEKEY
 )
);

This would solve the "inserting on an index table", but I would have to update the Global Temporary A with each insertion I make.

I'm kind of lost here,

Is there a better way to achieve this?

Thanks in advance,

+2  A: 

Hi Tom,

if the column A.SOMEKEY is declared NOT NULL and if you insert a large amound of data, a NOT IN clause might be more efficient than your NOT EXISTS since it will be able to use a HASH ANTI-JOIN.

INSERT INTO A
   (SELECT DISTINCT FIELDS
      FROM B, C, D ..
     WHERE (JOIN CONDITIONS ON B, C, D..)
       AND [B].SOMEKEY NOT IN (SELECT SOMEKEY FROM A)
       AND [B].SOMEKEY IS NOT NULL;

HASH ANTI-JOINS are brutally efficient with large data sets.

I don't think the temporary table is a good idea in that case because you will be in one of these two cases:

  1. the temporary table is indexed on SOMEKEY, your point about inserting into an indexed table being therefore moot
  2. the temporary table is unindexed and your anti-join will be inefficient

Which method is the most efficient will probably depends upon the volume of data.

Vincent Malgrat
Volumes of data are expected to be "large"
Tom
A: 

How about having the index on the table A. create table b (same structure as table a) with NOLOGGING

Insert /*+APPEND */ into b (
 Select DISTINCT [FIELDS] from B,C,D.. 
 WHERE (JOIN CONDITIONS ON B,C,D..)
 AND 
 NOT EXISTS
 ( 
   SELECT * FROM A ATMP WHERE
   ATMP.SOMEKEY = A.SOMEKEY
 )
);

Then drop the index on A and INSERT INTO A SELECT * FROM B

You could make B a global temporary table, but make sure that the data is persistent for the session as dropping the index will implictly commit.

Gary