views:

831

answers:

7

I have two tables: a source table and a target table. The target table will have a subset of the columns of the source table. I need to update a single column in the target table by joining with the source table based on another column. The update statement is as follows:

UPDATE target_table tt
SET special_id = ( SELECT source_special_id
                   FROM source_table st
                   WHERE tt.another_id = st.another_id )

For some reason, this statement seems to run indefinitely. The inner select happens almost immediately when executed by itself. The table has roughly 50,000 records and its hosted on a powerful machine (resources are not an issue).

Am I doing this correctly? Any reasons the above wouldn't work in a timely manner? Any better way to do this?

+1  A: 

Update: Ok, now that the query has been fixed -- I've done this exact thing many times, on unindexed tables well over 50K rows, and it worked fine in Oracle 10g and 9i. So something else is going on here; yes, you are calling for nested loops, but no, it shouldn't run forever, even so. What are the primary keys on these tables? Do you by any chance have multiple rows from the second table matching the same row for the first table? You could be trying to rewrite the whole table over and over, throwing the locking system into a fit.


Original Response

That statement doesn't really make sense -- you are telling it to update all the rows where ids match, to the same id (meaning, no change happens!).

I imagine the real statement looks a bit different?

Please also provide table schema information (primary keys for the 2 tables, any available indexes, etc).

SquareCog
I think the point is the subquery...you're right on the ID update though, that doesn't make sense. But I think his statement executes the inner query once for every router row.
Codewerks
Whoops, I mistyped the above statement. I fixed it. The join happens on a different ID.
j0rd4n
+6  A: 

Your initial query executes the inner subquery once for every row in the outer table. See if Oracle likes this better:

UPDATE target_table 
SET special_id = st.source_special_id
FROM 
    target_table tt
    INNER JOIN
    source_table st
        WHERE tt.another_id = st.another_id

(edited after posted query was corrected)

Add: If the join syntax doesn't work on Oracle, how about:

UPDATE target_table 
SET special_id = st.source_special_id
FROM 
    target_table tt, source_table st
WHERE tt.another_id = st.another_id

The point is to join the two tables rather than using the outer query syntax you are currently using.

Codewerks
+1 for the succinct definition of "correlated sub-select" :)
GalacticCowboy
I get a SQL command not properly ended. Adding a ';' at the end doesn't fix it.
j0rd4n
I haven't used Oracle in a couple years, but for SQL Server (and I believe standard ANSI SQL) you should use the table alias in the UPDATE clause. Otherwise, if you performed a self-join the update would be ambiguous.
Tom H.
This is not valid Oracle syntax
Tony Andrews
@Tom, no this is correct for SQL, the alias has to be in the from clause. @Tony for Oracle, I'm not sure, but you may have to an older ANSI join using a WHERE clause after the FROM.
Codewerks
How would the server know what to update here:UPDATE MyTableSET MyInt = t1.MyInt + 1FROM MyTable t1INNER JOIN MyTable t2 ON t2.ParentID = t1.IDWhereas, the following is clear:UPDATE t2SET MyInt = t1.MyInt + 1FROM MyTable t1INNER JOIN MyTable t2 ON t2.ParentID = t1.ID
Tom H.
(cont.) Sorry, lack of formatting in comments
Tom H.
It's not clear to me why this is being up-voted so much when it doesn't work in Oracle - this is no "UPDATE FROM" statement in Oracle!
Tony Andrews
@Tom: your second option has the same effect as my syntax, although I find it counterintuitive to update an alias.@Tony: I'm only putting this out there to illustrate a point, pls offer an Oracle based solution then...!
Codewerks
@AugustLights - In your syntax you only have the table once in your query. Look at the first query that I listed though. How do you know which table will be updated?
Tom H.
Cuz, people don't read the question and often don't read the answer before voting. It's ok to offer solutions for other RDBMS, it may be of SOME value. But it doesn't deserve to be voted up as A GOOD answer. Perhaps Comrade AL has friends who Karma Bomb? JK
@Tom: You're correct for your query, but I don't think that's the gist of what i wrote. What am I missing?@Mark, ha ha, I wish...! You're right, it's ludicrous that an answer like this gets upvoted vs stuff I labored over....
Codewerks
@AugustLights: I'm just pointing out that it's generally safer to use table aliases in the update clause of the statement to avoid any potential ambiguity.
Tom H.
great answer - mark as answer already!
JohnIdol
A: 

Not sure what Oracle has available, but MS Sql Server has a tuning advisor that you can feed your queries into and it will give recommendation for adding indexes, etc... I would assume Oracle has something similar.

That would be the quickest way to pinpoint the issue.

Harrison
A: 

I don't know Oracle, but MSSQLServer optimizer would have no problem converting the subquery into a join for you.

It sounds like you might be doing a data import against a short-lived or newly created table. It is easy to overlook indexing these kinds of tables. I'd make sure there's an index on sourcetable.anotherid - or a covering index on sourcetable.anotherid, sourcetable.specialid (order matters, anotherid should be first).

In cases such as these (query running unexpectedly for longer than 1 second). It is best to figure out how your environment provides query plans. Examine that plan and the problem will become clear.

You see, there is no such thing as "optimized sql code". Sql code is never executed - query plans are generated from the code and then those plans are executed.

David B
Is that different that any other >2GL language? Most code written today isn't executed... it's changed into a series of instructions which can be. You're saying I can't write optimized C++ or VB or Java?
The degree of transformation with SQL->query plan is quite a bit more severe than the compilation of code. We're talking about decisions that affect big O. This is why the same code can run in 5 years or 5 milliseconds.
David B
+4  A: 

Is there an index on source_table(another_id)? If not source_table will be fully scanned once for each row in target_table. This will take some time if target_table is big.

Is it possible for there to be no match in source_table for some target_table rows? If so, your update will set special_id to null for those rows. If you want to avoid that do this:

UPDATE target_table tt
SET special_id = ( SELECT source_special_id
                   FROM source_table st
                   WHERE tt.another_id = st.another_id )
WHERE EXISTS( SELECT NULL
              FROM source_table st
              WHERE tt.another_id = st.another_id );

If target_table.another_id was declared as a foreign key referencing source_table.another_id (unlikely in this scenario), this would work:

UPDATE
  ( SELECT tt.primary_key, tt.special_id, st.source_special_id
    FROM   tatget_table tt
    JOIN   source_table st ON st.another_id = tt.another_id
  )
SET special_id = source_special_id;
Tony Andrews
+2  A: 

Are you actually sure that it's running?

Have you looked for blocking locks? indefinitely is a long time and that's usually only achieved via something stalling execution.

A: 

Check that the statistics are up to date on the tables - see this question

hamishmcn