views:

44

answers:

3

I have a Constraint on a table with IGNORE_DUP_KEY. This allows bulk inserts to partially work where some records are dupes and some are not (only inserting the non-dupes). However, it does not allow updates to partially work, where I only want those records updated where dupes will not be created.

Does anyone know how I can support IGNORE_DUP_KEY when applying updates?

I am using MS SQL 2005

A: 

I have the feeling you should use the MERGE statement and then in the update part you should really not update the key you want to have unique. That also means that you have to define in your table that a key is unique (Setting a unique index or define as primary key). Then any update or insert with a duplicate key will fail.

Edit: I think this link will help on that:

http://msdn.microsoft.com/en-us/library/bb522522.aspx

Jürgen Hollfelder
MERGE is MSSQL 2008 only, and DEH is using MSSQL 2005
Pondlife
Too bad but you are right.
Jürgen Hollfelder
A: 

I'm not sure what is really going on, but if you are inserting duplicates and updating Primary Keys as part of a bulk load process, then a staging table might be the solution for you. You create a table that you make sure is empty prior to the bulk load, then load it with the 100% raw data from the file, then process that data into your real tables (set based is best). You can do things like this to insert all rows that don't already exist:

INSERT INTO RealTable
        (pk, col1, col2, col3)
    SELECT
        pk, col1, col2, col3
        FROM StageTable s
        WHERE NOT EXISTS (SELECT
                              1
                              FROM RealTable r
                              WHERE s.pk=r.pk
                         )

Prevent the duplicates in the first place is best. You could also do UPDATEs on your real table by joining in the staging table, etc. This will avoid the need to "work around" the constraints. When you work around the constraints, you usually create difficult to find bugs.

KM
A: 

If I understand correctly, you want to do UPDATEs without specifying the necessary WHERE logic to avoid creating duplicates?

create table #t (col1 int not null, col2 int not null, primary key (col1, col2))

insert into #t 
select 1, 1 union all 
select 1, 2 union all 
select 2, 3

-- you want to do just this...
update #t set col2 = 1

-- ... but you really need to do this
update #t set col2 = 1
where not exists (
    select * from #t t2
    where #t.col1 = t2.col1 and col2 = 1
    )

The main options that come to mind are:

  1. Use a complete UPDATE statement to avoid creating duplicates
  2. Use an INSTEAD OF UPDATE trigger to 'intercept' the UPDATE and only do UPDATEs that won't create a duplicate
  3. Use a row-by-row processing technique such as cursors and wrap each UPDATE in TRY...CATCH... or whatever the language's equivalent is

I don't think anyone can tell you which one is best, because it depends on what you're trying to do and what environment you're working in. But because row-by-row processing could potentially produce some false positives, I would try to stick with a set-based approach.

Pondlife
I was hoping there would be some mechanism akin to the IGNORE_DUP_KEY but for updates. Alas I will have to use your first option above. Many thanks.
DEH