views:

1059

answers:

6

Hi!

I'm inserting a large amount of rows into an empty table with a primary key constraint on one column. If there is a duplicate key error, is there any way to find out the value of the key (or row) that caused the error?

Validating the data prior to the insert is sadly not something I can do right now.

Using SQL 2008.

Thanks!

A: 

Revised:
Since you don't want to insert twice, could you:

Drop the primary key constraint.
Insert all data into the table
Find any duplicates, and remove them
Then re-add the primary key constraint

Previous reply: Insert the data into a duplicate of the table without the primary key constraint.

Then run a query on it to determine rows which have duplicate values for the rpimary key column.

select count(*), <Primary Key>
from table
group by <Primary Key>
having count(*) > 1
Bravax
Weird way of doing that. Now you have a table filled with duplicates which means a delete at a later stage.
Ray Booysen
Yet you've identifed the duplicates which were the problem.
Bravax
Doesn't really identify which row was in error, just the fact that there are rows with conflicting primary keys.
Ray Booysen
It identifies which primary key... thus which row.
Bravax
and if other processs are hitting the table it could create problems. Bad idea.
HLGEM
The question does say it's an empty table that is being inserted into, so unlikely other processes are hitting the table.
Arry
A: 

Doing the count(*) / group by thing is something I'm trying to avoid, this is an insert of hundreds of millions of rows from hundreds of different DB's (some of which are on remote servers)...I don't have the time or space to do the insert twice.

The data is supposed to be unique from the providers, but unfortunately their validation doesn't seem to work correctly 100% of the time and I'm trying to at least see where it's failing so I can help them troubleshoot.

Thank you!

capnsue
how are you inserting the rows? Is their logging to identify which data file contains the duplicate? or when the process fails?
Bravax
Capnsue - when you're responding to someone, post your message as a comment to their answer, not as a new answer. Hope that helps!
Brent Ozar
+3  A: 

There's not a way of doing it that won't slow your process down, but here's one way that will make it easier. You can add an instead-of trigger on that table for inserts and updates. The trigger will check each record before inserting it and make sure it won't cause a primary key violation. You can even create a second table to catch violations, and have a different primary key (like an identity field) on that one, and the trigger will insert the rows into your error-catching table.

Here's an example of how the trigger can work:

CREATE TRIGGER mytrigger ON sometable
INSTEAD OF INSERT
AS BEGIN
  INSERT INTO sometable SELECT * FROM inserted WHERE ISNUMERIC(somefield) = 1 FROM inserted;
  INSERT INTO sometableRejects SELECT * FROM inserted WHERE ISNUMERIC(somefield) = 0 FROM inserted;
END

In that example, I'm checking a field to make sure it's numeric before I insert the data into the table. You'll need to modify that code to check for primary key violations instead - for example, you might join the INSERTED table to your own existing table and only insert rows where you don't find a match.

Brent Ozar
A: 

Use SSIS to import the data and have it check for this as part of the data flow. That is the best way to handle. SSIS can send the bad records to a table (that you can later send to the vendor to help them clean up their act) and process the good ones.

HLGEM
+1  A: 

The solution would depend on how often this happens. If it's <10% of the time then I would do the following:

  1. Insert the data
  2. If error then do Bravax's revised solution (remove constraint, insert, find dup, report and kill dup, enable constraint).

This means it's only costing you on the few times an error occurs.

If this is happening more often then I'd look at sending the boys over to see the providers :-)

Arry
A: 

I can't believe that SSIS does not easily address this "reality", because, let's face it, oftentimes you need and want to be able to:

  1. See if a record exists with a certain unique or primary key
  2. If it does not, insert it
  3. If it does, either ignore it or update it.

I don't understand how they would let a product out the door without this capability built-in in an easy-to-use manner. Like, say, set an attribute of a component to automatically check this.

squealy