views:

1055

answers:

10

I was helping out some colleagues of mine with an SQL problem. Mainly they wanted to move all the rows from table A to table B (both tables having the same columns (names and types)). Although this was done in Oracle 11g I don't think it really matters.

Their initial naive implementation was something like

BEGIN
  INSERT INTO B SELECT * FROM A
  DELETE FROM A
  COMMIT;
END

Their concern was if there were INSERTs made to table A during copying from A to B and the "DELETE FROM A" (or TRUNCATE for what was worth) would cause data loss (having the newer inserted rows in A deleted).

Ofcourse I quickly recommended storing the IDs of the copied rows in a temporary table and then deleting just the rows in A that matched the IDS in the temporary table.

However for curiosity's sake we put up a little test by adding a wait command (don't remember the PL/SQL syntax) between INSERT and DELETE. THen from a different connection we would insert rows DURING THE WAIT.

We observed that was a data loss by doing so. I reproduced the whole context in SQL Server and wrapped it all in a transaction but still the fresh new data was lost too in SQL Server. This made me think there is a systematic error/flaw in the initial approach.

However I can't tell if it was the fact that the TRANSACTION was not (somehow?) isolated from the fresh new INSERTs or the fact that the INSERTs came during the WAIT command.

In the end it was implemented using the temporary table suggested by me but we couldn't get the answer to "Why the data loss". Do you know why?

+1  A: 

i don't know if this is relevant, but in SQL Server the syntax is

begin tran
....
commit

not just 'begin'

Steven A. Lowe
+1  A: 

You need to set your transaction isolation level so that the inserts from another transaction don't affect your transaction. I don't know how to do that in Oracle.

Paul Tomblin
+5  A: 

I can't speak to the transaction stability, but an alternate approach would be to have the second step delete from the source table where exists (select ids from target table).

Forgive the syntax, I have not tested this code, but you should be able to get the idea:

INSERT INTO B SELECT * FROM A;

DELETE FROM A WHERE EXISTS (SELECT B.<primarykey> FROM B WHERE B.<primarykey> = A.<primarykey>);

That way you are using the relational engine to enforce that no newer data will be deleted, and you don't need to do the two steps in a transaction.

Update: corrected syntax in subquery

Guy Starbuck
I go with @Guy. It's much better to make everything explicit and obvious. Doing this using SQL where possible is less arcane than using isolation levels (if you can do it, of course).
AJ
The syntax is wrong: "... where exists (Select B.<PK> from B where B.<PK> = A.<PK>)
Alex Yakunin
+5  A: 

This can be achieved in Oracle using:

Alter session set isolation_level=serializable;

This can be set in PL/SQL using EXECUTE IMMEDIATE:

BEGIN
    EXECUTE IMMEDIATE 'Alter session set isolation_level=serializable';
    ...
END;

See Ask Tom: On Transaction Isolation Levels

Tony Andrews
+2  A: 

It's just the way transactions work. You have to pick the correct isolation level for the task at hand.

You're doing INSERT and DELETE in the same transaction. You don't mention the isolation mode transaction is using, but it's probably 'read committed'. This means that the DELETE command will see the records that were committed in the meantime. For this kind of job, it's much better to use 'snapshot' type of transaction, because then both INSERT and DELETE would know about the same set of records - only those and nothing else.

Milan Babuškov
A: 

Yes Milan, I haven't specified the transaction isolation level. I suppose it's the default isolation level which I don't know which it is. Neither in Oracle 11g nor in SQL Server 2005.

Furthermore the INSERT that was made during the WAIT command (on the 2nd connection) was NOT inside a transaction. Should have it been to prevent this data loss?

Andrei Rinea
AFAIK, you cannot have anything done without transaction. Although you did not start one explicitly, the server started an implicit one for you.
Milan Babuškov
+7  A: 

Depending on your isolation level, selecting all the rows from a table does not prevent new inserts, it will just lock the rows you read. In SQL Server, if you use the Serializable isolation level then it will prevent new rows if they would have been including in your select query.

http://msdn.microsoft.com/en-us/library/ms173763.aspx -

SERIALIZABLE Specifies the following:

  • Statements cannot read data that has been modified but not yet committed by other transactions.

  • No other transactions can modify data that has been read by the current transaction until the current transaction completes.

  • Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.

jwanagel
+1  A: 

In Oracle, the default transaction isolation level is read committed. That basically means that Oracle returns the results as they existed at the SCN (system change number) when your query started. Setting the transaction isolation level to serializable means that the SCN is captured at the start of the transaction so all the queries in your transaction return data as of that SCN. That ensures consistent results regardless of what other sessions and transactions are doing. On the other hand, there may be a cost in that Oracle may determine that it cannot serialize your transaction because of activity that other transactions are performing, so you would have to handle that sort of error.

Tony's link to the AskTom discussion goes in to substantially more detail about all this-- I highly recommend it.

Justin Cave
A: 

This is the standard behaviour of the default read-committed mode, as mentioned above. The WAIT command just causes a delay in processing, there's no link to any DB transaction handling.

To fix the problem you can either:

  1. set the isolation level to serializable, but then you can get ORA- errors, which you need to handle with retries! Also, you may get a serious performance hit.
  2. use a temp table to store the values first
  3. if the data is not too large to fit into the memory, you can use a RETURNING clause to BULK COLLECT INTO a nested table and delete only if the row is present in the nested table.
Andrew from NZSG
A: 

Alternatively, you can use snapshot isolation to detect lost updates:

When Snapshot Isolation Helps and When It Hurts

AlexKuznetsov