tags:

views:

764

answers:

6

I have a master table A, with ~9 Million rows. Another table B(same structure) has ~28K rows from table A. What would be the best way to remove all contents of B from table A.

The combination of all columns(~10) are unique. Nothing more in the form a of a unique key

+1  A: 

DELETE FROM TableA WHERE ID IN(SELECT ID FROM TableB)

Should work. Might take a while though.

IainMH
using a join inthe delete will probably be faster
HLGEM
My solution is a developer solution. Not a fancypants DBA solution. :-D But I like learning more sql.
IainMH
no id column in either table ...
IronGoofy
+2  A: 

one way, just list out all the columns

delete table a
where exists (select 1 from table b where b.Col1= a.Col1 
AND b.Col2= a.Col2 
AND b.Col3= a.Col3
AND b.Col4= a.Col4)
SQLMenace
+1 for saying "one way", not for your actual code...
gbn
IMHO it's always a given that it's one way. There's always more than 17 ways to anything.
IainMH
AFAIK there is only one way to truncate a table (in other words doing a minimally logged delete operation)
SQLMenace
Lets say the key is a compound key.. what changes now..
Bajji
just list them all out where b.Col1= a.Col1 AND b.Col2= a.Col2 AND b.Col3= a.Col3
SQLMenace
It might be a rumor, I heard that using NOT EXISTS usually beats EXISTS.
Haoest
A: 

Is there a key value (or values) that can be used?

something like

DELETE a FROM tableA a INNER JOIN tableB b on b.id = a.id

Joe
pls see updated qn
Bajji
+1  A: 
Delete t2 
from t1 
inner join t2 
  on t1.col1 = t2.col1
  and t1.col2 = t2.col2
  and t1.col3 = t2.col3
  and t1.col4 = t2.col4
  and t1.col5 = t2.col5
  and t1.col6 = t2.col6
  and t1.col7 = t2.col7
  and t1.col8 = t2.col8
  and t1.col9 = t2.col9
  and t1.col10 = t2.col0

This is likely to be very slow as you would have to have every col indexed which is highly unlikely in an environment when a table this size has no primary key, so do it during off peak. What possessed you to have a table with 9 million records and no primary key?

HLGEM
+1 for stating it'll be slow ...
IronGoofy
+1  A: 

If this is something you'll have to do on a regular basis, the first choice should be to try to improve the database design (looking for primary keys, trying to get the "join" condition to be on as few columns as possible).

If that is not possible, the distinct second option is to figure out the "selectivity" of each of the columns (i.e. how many "different" values does each column have, 'name' would be more selective than 'address country' than 'male/female'). The general type of statement I'd suggest would be like this:

Delete from tableA
where exists (select * from tableB
 where tableA.colx1 = tableB.colx1
 and tableA.colx2 = tableB.colx2
 etc. and tableA.colx10 = tableB.colx10).

The idea is to list the columns in order of the selectivity and build an index on colx1, colx2 etc. on tableB. The exact number of columns in tableB would be a result of some trial&measure. (Offset the time for building the index on tableB with the improved time of the delete statement.)

If this is just a one time operation, I'd just pick one of the slow methods outlined above. It's probably not worth the effort to think too much about this when you can just start a statement before going home ...

IronGoofy
KG
+1  A: 

If you have sufficient rights you can create a new table and rename that one to A. To create the new table you can use the following script:

CREATE TABLE TEMP_A AS
SELECT *
FROM   A
MINUS
SELECT *
FROM   B

This should perform pretty good.