ansaurus

Question

Update statement optimization (functions in the where clause)

Answer 1

A:

It looks like the only option you have is to perform your UPDATE in chunks. If you for example put a LIMIT 1000 in your statement, the performance should not notably decrease (I assume this query has to be executed on a live database).

You say that you don't have direct access yourself; if these people are able to run Bash scripts you could just loop the statement with the LIMIT as many times as necessary, putting a sleep # in the loop. Maybe this would be a viable workaround.

As others pointed out - yes, the single UPDATE is already the fastest method. But it seems that his problem is that even this takes too long, so I proposed to do this chunk by chunk.

The chunked run will run even longer until it is done, but it shouldn't occupy the database and make it inaccessible for other users (if you choose a good interval, that is). The problem is the writing to the database, not the searching (i.e. using WHERE LENGTH(name) < 3). So while this method will increase the total strain on the database, it will spread it over time and therefore not block the database. You could e.g. run this on 100-chunks and pause two seconds after each. Have this run over night and nobody will notice.

Pascal 2009-11-27 13:03:23

There is no `LIMIT` in Oracle but you could use `ROWNUM`.

Marius Burz 2009-11-27 13:52:37

I didn't downvote, but, wouldn't that be slower since you are doing the full table scan for each chunk?

Drevak 2009-11-27 13:53:26

Sorry for the `LIMIT`, I'm a MySQL guy. :)Yes, a single `UPDATE` is the fastest way to do this, that's why I didn't propose another statement. But his problem is that even this takes too much time and therefore blocks the database. Thus, when you only update the thing chunk by chunk, it will run much longer, but it won't decrease the performance of the database for other users. That's what I intended with my proposal. :)Will update my post accordingly.

Pascal 2009-11-27 14:41:13

An update of an entire table does not lock the whole table in Oracle - it only locks the current row while it is being updated. Any performance hit will have more to do with the amount of undo and redo being generated by the session. The biggest problem is usually rollback segment space if the session is trying to update too much in one go without committing.

Jeffrey Kemp 2009-11-29 07:54:02

Answer 2

+9 A:

Hi Devrak,

Your statement is already optimized. It is set-based and queries the table in the most efficient way possible (Full Table Scan). You won't be able to write a program that does the same work with less resources / time. You CAN write a program that performs poorly, that is non-restartable in case of error (ie: commit every 100 rows) and will monopolize more resources.

Follow Tom Kyte's mantra:

You should do it in a single SQL statement if at all possible.

If you cannot do it in a single SQL Statement, then do it in PL/SQL.

If you cannot do it in PL/SQL, try a Java Stored Procedure.

If you cannot do it in Java, do it in a C external procedure.

If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it

Vincent Malgrat 2009-11-27 13:25:09

Answer 3

+4 A:

Accessing 100k rows out of 500k (i.e. 20%) by index will almost certainly require more logical IOs then full scan.
I believe your statement is OK.

egorius 2009-11-27 13:26:01

Answer 4

+2 A:

On a side note it might be more robust to:

UPDATE X SET VALUE = LPAD(VALUE,3,'0') WHERE LENGTH(VALUE) < 3

... just in case.

David Aldridge 2009-11-27 13:31:10

Answer 5

+1 A:

The only reason not to do it in one statement is when you have to update so many rows that your rollback segments become too small.

In such a case (only!), sacrifizing some speed, you can do it like this in PL/SQL:

DECLARE mylimit 10000; /* Arbitrary limit */
BEGIN
  LOOP
    UPDATE X SET VALUE = '0'||VALUE WHERE LENGTH(VALUE) < 3 and ROWNUM<=mylimit;
    EXIT WHEN SQL%ROWCOUNT<mylimit;
  END LOOP;
END;

But this also doesn't work perfectly, because rows where length(VALUE)=1 will be updated twice, until they do no longer fulfill the WHERE condition. Sadly, this cannot easily be avoided...

ammoQ 2009-11-27 13:36:06

Answer 6

A:

If updating the entire table in one transaction is not an option (e.g. due to rollback space issues), another method is to break up the table into chunks (e.g. on ranges of PK values) and update them one chunk at a time.

Try to pick a "chunking" method that will tend to have all the rows within the same blocks - e.g. if the rows have generally been inserted roughly in order of ID, that'd be a good candidate - that way the update will tend to update all the rows in a block in one go.

If your predicate covers the majority of rows in the table, I'd expect full-table-scans for each update, which should be ok. (You can even monitor their progress by querying v$session_longops.)

Jeffrey Kemp 2009-11-29 07:57:11

Answer 7

A:

A Function based index could help speed up the udpates.

create index x_idx1 on x(length(value));

Here is an example.

sqlplus>create table t
  2  ( id NUMBER(9) PRIMARY KEY,
  3  name VARCHAR2(100)
  4  );
Table created.

sqlplus>insert into t select object_id, object_name from user_objects;
2188 rows created.

sqlplus>exec  dbms_stats.gather_table_stats(ownname=>'test',tabname =>'t');
PL/SQL procedure successfully completed.

sqlplus>create index t_idx3 on t(length(name));
Index created.

sqlplus>explain plan for update t set name = name || '1' where length(name) < 25;
Explained.

sqlplus>select * from table(dbms_xplan.display);
--------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |        |   109 |  2616 |     4   (0)| 00:00:01 |
|   1 |  UPDATE                      | T      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T      |   109 |  2616 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T_IDX3 |    20 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(LENGTH("NAME")<25)
15 rows selected.

Dinesh Bhat 2009-12-01 19:48:21

I tried that aproach but how should I build the query to make useful this index? The explain plain was still using a full access table

Drevak 2009-12-02 07:55:58

ansaurus

tags:

views:

answers:

Update statement optimization (functions in the where clause)

related questions