views:

254

answers:

6

The following tables are involved:

Table Product:
product_id
merged_product_id
product_name

Table Company_Product:
product_id
company_id

(Company_Product has a primary key on both the product_id and company_id columns)

I now want to run an update on Company_Product to set the product_id column to a merged_ product_id. This update could cause duplicates which would trigger a primary key violation, so therefore I added a 'not exists' check in the where clause and my query looks like this:

update cp
set cp.product_id = p.merged_product_id
from Company_Product cp
join Product p on p.product_id = cp.product_id
where p.merged_product_id is not null
and not exists 
 (select * from Company_Product cp2 
  where cp2.company_id = cp.company_id and 
  cp2.product_id = p.merged_product_id)

But this query fails with a primary key violation.

What I think might happen is that because the Product table contains multiple rows with the same merged_product_id, it will succeed the for the first product, but when going to the next product with the same merged_product_id, it'll fail because the 'not exists' subquery does not see the first change, as the query has not finished and committed yet.

Am I right in thinking this, and how would I change the query to make it work?

[EDIT] Some data examples:

Product:

product_id merged_product_id    
   23            35    
   24            35    
   25            12    
   26            35    
   27           NULL

Company_Product:

product_id company_id    
   23          2    
   24          2    
   25          2    
   26          3    
   27          4

[EDIT 2] Eventually I went with this solution, which uses a temporary table to to the update on and then inserts the updated data into the original Company_Product table:

create table #Company_Product
(product_id int, company_id int)

insert #Company_Product select * from Company_Product

update cp
set cp.product_id = p.merged_product_id
from #Company_Product cp
join Product p on p.product_id = cp.product_id
where p.merged_product_id is not null

delete from Company_Product

insert Company_Product select distinct * from #Company_Product

drop table #Company_Product
A: 

Without seeing your data, I believe your analysis is correct - the entire set is updated and then the commit fails since it results in a constraint violation. An EXISTS is never re-evaluated after "partial commit" of some of the UPDATE.

I think you need to more precisely define your rules regarding attempting to change multiple products to the same product according to the merged_product_id and then make those explicit in your query. For instance, you could exclude any products which would fall into that category with a further NOT EXISTS with appropriate query.

Cade Roux
+1  A: 

You can use MERGE if you are on SQL 2008 at least.

Otherwise you're going to have to choose a criteria to establish which merged_product_id you want in and which one you leave out:

update cp
set cp.product_id = p.merged_product_id
from Company_Product cp
cross apply (
  select top(1) merged_product_id
  from Product 
  where product_id = cp.product_id
  and p.merged_product_id is not null
  and not exists (
    select * from Company_Product cp2 
    where cp2.company_id = cp.company_id and 
    cp2.product_id = merged_product_id)
  order by <insert diferentiating criteria here>) as p

Note that this is not safe if multiple concurrent requests are running the merge logic.

Remus Rusanu
Sorry, can't use SQL 2008 as it has to run on SQL 2005 as well.
tomlog
+2  A: 

A primary key is supposed to be three things:

  1. Non-null
  2. Unique
  3. Unchanging

By altering part of the primary key you're violating requirement #3.

I think you'd be better off creating a new table, populating it, then drop the constraints, drop the original table, and rename the new table to the desired name (then of course, re-apply the original constraints). In my experience this gives you the chance to check out the 'new' data before making it 'live'.

Share and enjoy.

Bob Jarvis
You were the closest in suggesting the solution, so you get the answer. I used a temporary table instead, as described above.
tomlog
+1  A: 

I can't quite see how your structure is meant to work or what this update is trying to achieve. You seem to be updating Company_Product and setting a (new) product_id on an existing row that apparently has a different product_id; e.g., changing the row from one product to another. This seems...an odd use case, I'd expect you to be inserting a new unique row. So I think I'm missing something.

If you're converting Company_Product to using a new set of product IDs instead of an old set (the name "merged_product_id" makes me speculate this), are you sure that there is no overlap between the old and new? That would cause a problem like what you're describing.

T.J. Crowder
Yes, you're correct that I'm converting the old product IDs in Company_Product to the new set, specified in merged_product_id.And yes, there could be overlap, because there might already be an entry for the product_id and company_id. The question is how to prevent the query from trying to insert duplicates (which cause the PK violation).
tomlog
@tomlog: I don't think you'll be able to do an update in place, then, because (for example) when replacing the old product ID 23 with the new ID 35, it may conflict with an old product ID 35 already in the table. You'll have to create a new table, populating it via a join of the old one, and then swap it in. I also note that the sample data you added has a NULL for merged_product_id, which means that you'll be leaving the old ID 27 lying around in the table, which would be a Bad Thing(tm) if there is also a new ID 27.
T.J. Crowder
@T.J.: not all products are merged, so it's fine for ID 27 to stick around. Plus, all product IDs are unique, so there won't ever be a new product with ID 27.
tomlog
A: 

I think you are correct on why the update is failing. To fix this, run a delete query on your company_product table to remove the extra product_ids where the same merged_prduct_id will be applied.

here is a stab at what the query might be

delete company_product
  where product_id not in (
    select min(product_id)
      from product
      group by merged_product_id
  )
  and product_id not in (
    select product_id
      from product
      where merged_product_id is null
  )

-- Explanation added in resonse to comment --

What this tries to do is to delete rows that will be duplicates after the update. Since you have products with multiple merged ids, you really only need one of those products (for each company) in the table when you are done. So, my query (if it works...) will keep the min original product id for each merged product id - then your update will work.

So, let's say you have 3 product ids which will map to 2 merged ids: 1 -> 10, 2 -> 20, 3 -> 20. And you have the following company_product data:

product_id  company_id
1           A
2           A
3           A

If you run your update against this, it will try to change both the second and third rows to product id 20, and it will fail. If you run the delete I suggest, it will remove the third row. After the delete and the update, the table will look like this:

product_id  company_id
10          A
20          A
Ray
But surely by deleting rows I would lose my Company-Product link? How would I know which company a product was linked to when I'm trying to added the merged_product_id?
tomlog
@tomlog - edited answer in response
Ray
I can see now where you're going with this, but your query is not quite right as it would also delete rows where there wasn't a merged_product_id (=NULL, and it would only select the first product_id and delete the rest).
tomlog
I didn't consider nulls - I fixed the delete query to deal with them. Unless I am looking at it wrong, the delete will remove only product ids that will result in duplicate records (key violations) in the update - you can't keep these anyway
Ray
aha - I see what you mean - my delete query won't work. Sorry to waste your time. I think your best bet is to run a select joining the two tables to get the correct ids into a temp table, empty out the c_p table, and then insert into c_p from the temp table with a select distinct.
Ray
A: 

Try this:

create table #Company_Product
(product_id int, company_id int)
create table #Product (product_id int,merged_product_id int)
insert into #Company_Product
select           23, 2     
union all select 24, 2     
union all select 25, 2     
union all select 26, 3     
union all select 27, 4
insert into #product 
Select              23, 35     
union all select    24, 35     
union all select    25, 12     
union all select    26, 35     
union all select   27, NULL 

update cp 
set product_id = merged_product_id
from #company_product cp
join
  ( 
    select min(product_id) as product_id, merged_product_id  
      from #product where merged_product_id is not null
      group by merged_product_id 
  ) a on a.product_id = cp.product_id

delete cp 
--select *
from #company_product cp
join #product p on cp.product_id = p.product_id
where cp.product_id <> p.merged_product_id
and p.merged_product_id is not null
HLGEM
This query will only work when the Company_Product table has no primary key.
tomlog