views:

523

answers:

3

Let's say you're running an UPDATE statement on a table, but the information you're putting into this base table is from some other auxiliary table. Normally, you would JOIN the data and not expect the rows in the UPDATE statement's FROM clause to multiply, maintaining that one new row maps to one old row in the base table.

But I was wondering what would happen if your JOIN table was ambiguous somehow, like you couldn't account for each base entity mapping only to one joined entity. Or if you did something nonsensical like join to a base table to a table of its children and updated the base table using that information. How would it choose? Now there are multiple rows per one base table row.

I ran a statement like this in SQL Server 2005 and it seemed to be choosing the first row in each set. But that just seems wrong to me. Shouldn't it fire an error? Why is this desired behavior?

Example code

-- normal
-- categories are one-to-many bundles

update bundles_denormalized set category = c.description

from bundles_denormalized b
left join categories c
on b.category_id = c.id

-- ambiguous
-- bundles are one-to-many products

update bundles_denormalized set category = p.description

from bundles_denormalized b
left join products p
on b.id = p.bundle_id
+4  A: 

Actually if I understand the question correctly it is updating the field multiple times, it is just that since there is only one record, so it ends up with only one value. Why doesn't it error? Because the syntax is correct and the database has not way of knowing what your intent was. Would you want to do this? Not normally, this is why you should do a select of your update before you run it to ensure the corret records are getting the correct values.

I usually write an update with a join this way:

update b    
set category = p.description
--select b.category, p.description
from bundles_denormalized b
left join products p on b.id = p.bundle_id

I also would be wary of using a left join in an update as you will may get values changed to nulls. That's ok if that is what you wanted, but not if it isn't.

HLGEM
+4  A: 

From BOL on UPDATE

Using UPDATE with the FROM Clause

The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic. For example, in the UPDATE statement in the following script, both rows in Table1 meet the qualifications of the FROM clause in the UPDATE statement; but it is undefined which row from Table1 is used to update the row in Table2.

USE AdventureWorks;
GO
IF OBJECT_ID ('dbo.Table1', 'U') IS NOT NULL
    DROP TABLE dbo.Table1;
GO
IF OBJECT_ID ('dbo.Table2', 'U') IS NOT NULL
    DROP TABLE dbo.Table2;
GO
CREATE TABLE dbo.Table1 
    (ColA int NOT NULL, ColB decimal(10,3) NOT NULL);
GO
CREATE TABLE dbo.Table2 
    (ColA int PRIMARY KEY NOT NULL, ColB decimal(10,3) NOT NULL);
GO
INSERT INTO dbo.Table1 VALUES(1, 10.0);
INSERT INTO dbo.Table1 VALUES(1, 20.0);
INSERT INTO dbo.Table2 VALUES(1, 0.0);
GO
UPDATE dbo.Table2 
SET dbo.Table2.ColB = dbo.Table2.ColB + dbo.Table1.ColB
FROM dbo.Table2 
    INNER JOIN dbo.Table1 
    ON (dbo.Table2.ColA = dbo.Table1.ColA);
GO
SELECT ColA, ColB 
FROM dbo.Table2;

In other words, it is a valid syntax and it is not going to throw an error or exception.

But at the same time you cannot be sure that the update value will be the first or the last record from your FROM clause as it is not defined.

kristof
It's important to note the part that states that the results are unknown. Although in the OPs example the "first" result was always being used, that isn't guaranteed. SQL Server might do the updates in any order.
Tom H.
Dag nabbit, and I had that article open too. Didn't check the Remarks section. Winner!
Mark Canlas
It should be notes that different version of SQL will react differently. While SQL 2005 will usually use the first row, older versions of SQL will usually use the last row.
mrdenny
A: 

I actually just noticed that it does something sompletely stupid in my case. My intent was to create a single row select, but apparently there were duplicates that I didn't expect. In this case, it actually entered a mix of data into the target row, picking some columns from the first source row and some from the second source row.

I'm pretty sure Firebird throws an exception if an attempt is made to do something as ambiuous as this. But then Firebird doesn't support the (non-standard?) sntax update X from X join Y at all...

Kjell Rilbe