views:

52

answers:

2

I am bulk importing CSV file into a temporary table and then inserting the rows into the target table if they do not already exists.

When I start the process, my table has the following values

VECH01 AAA 111
VECH01 BBB 222
VECH01 CCC 333
VECH02 AAA 111
VECH02 BBB 222
VECH02 CCC 333

When I import a CSV file with these values:

VECH01|DDD|444
VECH01|DDD|555
VECH02|CCC|XXX

The 1st row is imported VECH01 DDD is not in the database.
OK

The 2nd row is imported but VECH01 DDD is already in the database, it was imported in the previous insert.
NOT OK

The 3rd row is not imported because VECH02 CCC already exists in the database.
OK

CREATE TABLE #csv 
(
 CarRedbookCode nvarchar(50) COLLATE Latin1_General_CI_AS,
 AccessoryCode nvarchar(50) COLLATE Latin1_General_CI_AS,
 AccessoryCodeAutoGeneral nvarchar(50) COLLATE Latin1_General_CI_AS
)

DECLARE @SqlStatement nvarchar(4000)
SET @SqlStatement =
'
BULK INSERT #csv
 FROM ''' + @FileName + '''
    WITH 
    ( 
        FIELDTERMINATOR = ''|'', 
        ROWTERMINATOR = ''\n'' 
    )
'
EXEC sp_executesql @SqlStatement

INSERT INTO  MapRedbookAccessory (CarRedbookCodeAccessoryCode, CarRedbookCode,     AccessoryCode, AccessoryCodeAutoGeneral) 
 select 
  src.CarRedbookCode + src.AccessoryCode
 , src.CarRedbookCode
 , src.AccessoryCode
 , src.AccessoryCodeAutoGeneral 
 from
  #csv src
 left join
  MapRedbookAccessory dst on dst.CarRedbookCodeAccessoryCode = src.CarRedbookCode + src.AccessoryCode
 where
  dst.CarRedbookCodeAccessoryCode is null
A: 

In your sample CSV import data, the two DDD rows have different numeric values after them which would make them different rows. So VECH01|DDD should be inserted twice. Is that correct, or is the sample data incorrect?

ulty4life
The sample data is correct, what should happen is that the 1st row with VECH01|DDD|444 should be inserted and the 2nd row VECH01|DDD|555 should be discarded, only the Redbook Code (Vech01) and Accessory Code (DDD) are part of the key, apparently I can get duplicate keys in the CSV file, but I only want to insert the first occurence and discard any others.
David See
+1  A: 

This is happening because the INSERT statement does not process the rows individually. There might be a better way to do this, but you can use the ROW_NUMBER function to insert only the first row for each code/accessory:

INSERT INTO  MapRedbookAccessory (CarRedbookCodeAccessoryCode, CarRedbookCode,     AccessoryCode, AccessoryCodeAutoGeneral) 
 select 
  src.CarRedbookCode + src.AccessoryCode
 , src.CarRedbookCode
 , src.AccessoryCode
 , src.AccessoryCodeAutoGeneral 
 from
 (select *, ROW_NUMBER() OVER (PARTITION BY CarRedbookCode, AccessoryCode ORDER BY AccessoryCodeAutoGeneral) AS row
    from #csv) src
 left join
  MapRedbookAccessory dst on dst.CarRedbookCodeAccessoryCode = src.CarRedbookCode + src.AccessoryCode
 where
  dst.CarRedbookCodeAccessoryCode is null
  and src.row = 1

You can change the ORDER BY depending on which AccessoryCodeAutoGeneral you want to insert if there are multiple rows.

adrift
Thanx, that worked fine
David See