views:

252

answers:

5

Hi,

I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.

I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.

What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?

ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.

+8  A: 

The usual way is to use UPDATE:

UPDATE mytable
   SET new_column = <expr containing old_column>

You should be able to do this is a single transaction.

Marcelo Cantos
It sounds like the OP knows how to do this in a single transaction, but there is a performance problem, so he tried to batch it into separate transactions.
Tim Drisdelle
That is possible, but is extraordinary that 1 M rows should take so long to update a single column. It's also possible that the OP is updating a record at a time, either through lack of understanding of set operations, or because they are trying to compute the new value in client code (either out of necessity or again, because of lack of understanding). Whatever the case, I'll be able to update my answer if the OP indicates which of the foregoing cases applies to them.
Marcelo Cantos
Fair enough. I agree that more information is needed.
Tim Drisdelle
OP: If it's a performance problem, do it at a quiet time. If your DBMS can't handle an update of a million rows, it's time to start looking at a new DBMS :-)
paxdiablo
Thank you all for the fast response. I skipped the part that I'm using generated SQL statements. Now I looked deep into it and it looks like the generated SQL updates row by row! So any attempt to separate in chunks of 100 records was meaningless... I'll change the code to generate a proper SQL UPDATE statement, as the one pointed here.
m_pGladiator
Nice! That's an epic fail for the generated SQL. Good work on the solution Marcelo.
Tim Drisdelle
+2  A: 

You could drop any indexes on the table, then do your insert, and then recreate the indexes.

Tim Drisdelle
+1. It was matter of time to suggest this, but yeah, for 10M or more rows, you can do as long as you get them done fast and quick.
Guru
For the love of whatever gods you worship, _do this at a quiet time_. Otherwise your users will track you down, torture you, kill you, quarter you, tar and feather the remains then burn them and spit on your charred body parts. At a minimum. They'll probably do far worse.
paxdiablo
Sounds like the groans of a DBA who has been sacrificed on the altar of performance...
Tim Drisdelle
If the new column is not indexed, removing indexes on the table will be useless (not that it matters since rebuilding a 1M rows index will not take much time)
Vincent Malgrat
Agreed. I already asked OP to provide more details. So much solution guessing right now.
Tim Drisdelle
A: 

Might not work you for, but a technique I've used a couple times in the past for similar circumstances.

created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.

Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.

David
+2  A: 

As Marcelos suggests:

UPDATE mytable
SET new_column = <expr containing old_column>;

If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:

UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;

Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.

Jeffrey Kemp
I think this is a great idea for very large and highly used tables! I didn't have such failures yet, but you have +1 :)
m_pGladiator
A: 

What is the database version? Check out virtual columns in 11g:

Adding Columns with a Default Value http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html

Stellios