tags:

views:

73

answers:

3

I have a very large table (5mm records). I'm trying to obfuscate the table's VARCHAR2 columns with random alphanumerics for every record on the table. My procedure executes successfully on smaller datasets, but it will eventually be used on a remote db whose settings I can't control, so I'd like to EXECUTE the UPDATE statement in batches to avoid running out of undospace.

Is there some kind of option I can enable, or a standard way to do the update in chunks?

I'll add that there won't be any distinguishing features of the records that haven't been obfuscated so my one thought of using rownum in a loop won't work (I think).

A: 

I do this by mapping the primary key to an integer (mod n), and then perform the update for each x, where 0 <= x < n.

For example, maybe you are unlucky and the primary key is a string. You can hash it with your favorite hash function, and break it into three partitions:

UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=0
UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=1
UPDATE myTable SET a=doMyUpdate(a) WHERE MOD(ORA_HASH(ID), 3)=2

You may have more partitions, and may want to put this into a loop (with some commits).

Kyle Lahnakoski
One should use the NTILE() analytic function if one wants evently sized sets; ORA_HASH can have unpredictable values, especially when using a value that isn't a power of 2 for the number of buckets to hash into. ORA_HASH(n, 3) can have 4 values, so your example would have missed updating about 1/4 of the data.
Adam Musch
Adam: Please note I used "MOD(ORA_HASH(ID), 3)", not "ORA_HASH(ID, 3)". I deliberately used MOD because the extra parameters for ORA_HASH are confusing.Thanks for the NTILE() reference. I am still not fully familiar with analytics.
Kyle Lahnakoski
+2  A: 

If you are going to update every row in a table, you are better off doing a Create Table As Select, then drop/truncate the original table and re-append with the new data. If you've got the partitioning option, you can create your new table as a table with a single partition and simply swap it with EXCHANGE PARTITION.

Inserts require a LOT less undo and a direct path insert with nologging (/+APPEND/ hint) won't generate much redo either.

With either mechanism, there would probably sill be 'forensic' evidence of the old values (eg preserved in undo or in "available" space allocated to the table due to row movement).

Gary
Don't forget to disable foreign key constraints referencing the table.
Codo
A: 

The following is untested, but should work:

declare
  number l_fetchsize  := 10000;
  cursor cur_getrows is
  select rowid, random_function(my_column)
    from my_table;

  type rowid_tbl_type      is table of urowid;
  type my_column_tbl_type  is table of my_table.my_column%type;

  rowid_tbl     rowid_tbl_type;
  my_column_tbl my_column_tbl_type;
begin

  open cur_getrows;
  loop
    fetch cur_getrows bulk collect  
      into rowid_tbl, my_column_tbl 
      limit l_fetchsize;
    exit when rowid_tbl.count = 0;

    forall i in rowid_tbl.first..rowid_tbl.last
      update my_table 
         set my_column = my_column_tbl(i)
       where rowid     = rowid_tbl(i);
    commit;
  end loop;
  close cur_getrows;
end;
/

This isn't optimally efficient -- a single update would be -- but it'll do smaller, user-tunable batches, using ROWID.

Adam Musch