views:

424

answers:

3

I am trying to do a bulk upload into a SQL server DB. The source file has duplicates which I want to remove, so I was hoping that the operation would automatically upload the first one, then discard the rest. (I've set a unique key constraint). Problem is, the moment a duplicate upload is attempted the whole thing fails and gets rolled back. Is there any way I can just tell SQL to keep going?

+4  A: 

Try to bulk insert the data to the temporary table and then SELECT DISTINCT as @madcolor suggested or

INSERT INTO yourTable
SELECT * FROM #tempTable tt
WHERE NOT EXISTS (SELECT 1 FROM youTable yt WHERE yt.id = tt.id)

or other field in WHERE clause.

Lukasz Lysik
A: 

If you're doing this through some SQL tool like SQL Plus or DBVis or Toad, then I suspect not. If you're doing this programatically in a language, then you need to divide and conquer. Presumably executing an update line by line and catching each exception would be too lengthy a process, so instead you could do a batch operation first on the whole SQL block, and if it fails, do it on the first half, and if that fails, do it on the first half of the first half. Iterate this way until you have a block that succeeds. Discard the block and do the same procedure on the rest of the SQL. Anything that violates a constraint will eventually end up as a sole SQL statement which you know to log and discard. This should import with as much bulk processing as is possible while still throwing out the invalid lines.

Jherico
A: 

Use SSIS for this. You can tell it to skip the duplicates. But first make sure they are true duplicates. What if the data in some of the columns is different, how do you know which is the better record to keep?

HLGEM