views:

1290

answers:

4

Hi,

I need to transfer data from one table to another. The second table got a primary key constraint (and the first one have no constraint). They have the same structure. What I want is to select all rows from table A and insert it in table B without the duplicate row (if a row is0 duplicate, I only want to take the first one I found)

Example :

MyField1 (PK) | MyField2 (PK) | MyField3(PK) | MyField4 | MyField5


1 | 'Test' | 'A1' | 'Data1' | 'Data1'
2 | 'Test1' | 'A2' | 'Data2' | 'Data2'
2 | 'Test1' | 'A2' | 'Data3' | 'Data3'
4 | 'Test2' | 'A3' | 'Data4' | 'Data4'

Like you can see, the second and third line got the same pk key, but different data in MyField4 and MyField5. So, in this example, I would like to have the first, second, and fourth row. Not the third one because it's a duplication of the second (even if MyField4 and MyField5 contain different data).

How can I do that with one single select ?

thx

+1  A: 

What is your database? In Oracle you could say

SELECT FROM your_table
WHERE rowid in
(SELECT MIN(rowid)
 FROM your_table
 GROUP BY MyField1, MyField2, MyField3);

Note that it is somewhat uncertain which of the rows with the same PK will be considered "first". If you need to impose a specific order, you need to additionally sort on the other columns.

Thilo
Would this run the nested select statement once for each row in your_table? If it would, then you'll get pretty bad performance.Hopefully, the nested statement would be cached. Not that familiar with the query planning part of it.
Bassam
I use MS SQL 2005 but I think this syntax will work, I'll try tomorrow and I'll let you know. Thx!
Melursus
It would not be run for each row, just once.
Thilo
there is no rowid field in MSSQL
Jk
+1  A: 

Not sure how you know which of row 2 and row 3 you want in the new table, but in mysql you can simply:

insert ignore into new_table (select * from old_table);

And the PK won't allow duplicate entries to be inserted.

Chris J
+1  A: 

First, you need to define what makes a row "first". I'll make up an arbitrary definition and you can change the SQL as you need to for what you want. For this example, I assume "first" to be the lowest value for MyField4 and if they are equal then the lowest value for MyField5. It also accounts for the possibility of all 5 columns being identical.

SELECT DISTINCT
     T1.MyField1,
     T1.MyField2,
     T1.MyField3,
     T1.MyField4,
     T1.MyField5
FROM
     MyTable T1
LEFT OUTER JOIN MyTable T2 ON
     T2.MyField1 = T1.MyField1 AND
     T2.MyField2 = T1.MyField2 AND
     T2.MyField3 = T1.MyField3 AND
     (
          T2.MyField4 > T1.MyField4 OR
          (
               T2.MyField4 = T1.MyField4 AND
               T2.MyField5 > T1.MyField5
          )
     )
WHERE
     T2.MyField1 IS NULL

If you also want to account for PKs that are not duplicated in the source table, but already exist in your destination table then you'll need to account for that too.

Tom H.
A: 
CREATE TABLE #A(
ID INTEGER IDENTITY,
[MyField1] [int] NULL,
[MyField2] [varchar](10) NULL,
[MyField3] [varchar](10) NULL,
[MyField4] [varchar](10) NULL,
[MyField5] [varchar](10) NULL
) 

INSERT INTO #A (MyField1,MyField2,MyField3,MyField4,MyField5) SELECT * FROM A

insert into B 
   select MyField1,MyField2,MyField3,MyField4,MyField5 from #A a1 
    where not exists (select id from #A a2 where a2.MyField1 = a1.MyField1 and a2.ID < a1.ID)

DROP TABLE #A

OR

insert into b
  select distinct * from a a1 
    where not exists (
  select a2.MyField1 from a a2 where a1.MyField1 = a2.MyField1 and 
       (a1.MyField2 < a2.MyField2 or a1.MyField3 < a2.MyField3 
        or a1.MyField4 < a2.MyField5 or a1.MyField5 < a2.MyField5))
Jk