tags:

views:

283

answers:

7

I feel dumb right now.

I got to update 100.000 rows on a database that i don't have direct access to. The total row count of the table is roughtly 500.000 rows. The update just adds one caracter to a field in case it's length is < 3. So Basically:

UPDATE X SET VALUE = '0'||VALUE WHERE LENGTH(VALUE) < 3

So i send this update to the DBA's and they return it to me saying that the statement has too much performance cost (because the full access table and the 100k commit) and that i should write a proces instead. And then they provide me a code example, in case i don't know how to make one.

I say WTF, how a process would ever run faster than a single update statement? Afer doing doing some tests, my update takes 30 seconds to run, the process, following their code example, takes 10 minutes.

So the real question, after all this frustation, is: Is there any way to avoid the full acces table when using such a function in the where clause? (the column is indexed)

A: 

It looks like the only option you have is to perform your UPDATE in chunks. If you for example put a LIMIT 1000 in your statement, the performance should not notably decrease (I assume this query has to be executed on a live database).

You say that you don't have direct access yourself; if these people are able to run Bash scripts you could just loop the statement with the LIMIT as many times as necessary, putting a sleep # in the loop. Maybe this would be a viable workaround.


As others pointed out - yes, the single UPDATE is already the fastest method. But it seems that his problem is that even this takes too long, so I proposed to do this chunk by chunk.

The chunked run will run even longer until it is done, but it shouldn't occupy the database and make it inaccessible for other users (if you choose a good interval, that is). The problem is the writing to the database, not the searching (i.e. using WHERE LENGTH(name) < 3). So while this method will increase the total strain on the database, it will spread it over time and therefore not block the database. You could e.g. run this on 100-chunks and pause two seconds after each. Have this run over night and nobody will notice.

Pascal
There is no `LIMIT` in Oracle but you could use `ROWNUM`.
Marius Burz
I didn't downvote, but, wouldn't that be slower since you are doing the full table scan for each chunk?
Drevak
Sorry for the `LIMIT`, I'm a MySQL guy. :)Yes, a single `UPDATE` is the fastest way to do this, that's why I didn't propose another statement. But his problem is that even this takes too much time and therefore blocks the database. Thus, when you only update the thing chunk by chunk, it will run much longer, but it won't decrease the performance of the database for other users. That's what I intended with my proposal. :)Will update my post accordingly.
Pascal
An update of an entire table does not lock the whole table in Oracle - it only locks the current row while it is being updated. Any performance hit will have more to do with the amount of undo and redo being generated by the session. The biggest problem is usually rollback segment space if the session is trying to update too much in one go without committing.
Jeffrey Kemp
+9  A: 

Hi Devrak,

Your statement is already optimized. It is set-based and queries the table in the most efficient way possible (Full Table Scan). You won't be able to write a program that does the same work with less resources / time. You CAN write a program that performs poorly, that is non-restartable in case of error (ie: commit every 100 rows) and will monopolize more resources.

Follow Tom Kyte's mantra:

  • You should do it in a single SQL statement if at all possible.
  • If you cannot do it in a single SQL Statement, then do it in PL/SQL.
  • If you cannot do it in PL/SQL, try a Java Stored Procedure.
  • If you cannot do it in Java, do it in a C external procedure.
  • If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it…
Vincent Malgrat
+4  A: 

Accessing 100k rows out of 500k (i.e. 20%) by index will almost certainly require more logical IOs then full scan.
I believe your statement is OK.

egorius
+2  A: 

On a side note it might be more robust to:

UPDATE X SET VALUE = LPAD(VALUE,3,'0') WHERE LENGTH(VALUE) < 3

... just in case.

David Aldridge
+1  A: 

The only reason not to do it in one statement is when you have to update so many rows that your rollback segments become too small.

In such a case (only!), sacrifizing some speed, you can do it like this in PL/SQL:

DECLARE mylimit 10000; /* Arbitrary limit */
BEGIN
  LOOP
    UPDATE X SET VALUE = '0'||VALUE WHERE LENGTH(VALUE) < 3 and ROWNUM<=mylimit;
    EXIT WHEN SQL%ROWCOUNT<mylimit;
  END LOOP;
END;

But this also doesn't work perfectly, because rows where length(VALUE)=1 will be updated twice, until they do no longer fulfill the WHERE condition. Sadly, this cannot easily be avoided...

ammoQ
A: 

If updating the entire table in one transaction is not an option (e.g. due to rollback space issues), another method is to break up the table into chunks (e.g. on ranges of PK values) and update them one chunk at a time.

Try to pick a "chunking" method that will tend to have all the rows within the same blocks - e.g. if the rows have generally been inserted roughly in order of ID, that'd be a good candidate - that way the update will tend to update all the rows in a block in one go.

If your predicate covers the majority of rows in the table, I'd expect full-table-scans for each update, which should be ok. (You can even monitor their progress by querying v$session_longops.)

Jeffrey Kemp
A: 

A Function based index could help speed up the udpates.

create index x_idx1 on x(length(value));

Here is an example.

sqlplus>create table t
  2  ( id NUMBER(9) PRIMARY KEY,
  3  name VARCHAR2(100)
  4  );
Table created.

sqlplus>insert into t select object_id, object_name from user_objects;
2188 rows created.

sqlplus>exec  dbms_stats.gather_table_stats(ownname=>'test',tabname =>'t');
PL/SQL procedure successfully completed.

sqlplus>create index t_idx3 on t(length(name));
Index created.

sqlplus>explain plan for update t set name = name || '1' where length(name) < 25;
Explained.

sqlplus>select * from table(dbms_xplan.display);
--------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |        |   109 |  2616 |     4   (0)| 00:00:01 |
|   1 |  UPDATE                      | T      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T      |   109 |  2616 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T_IDX3 |    20 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(LENGTH("NAME")<25)
15 rows selected.
Dinesh Bhat
I tried that aproach but how should I build the query to make useful this index? The explain plain was still using a full access table
Drevak