ansaurus

Question

Continuing a transaction after primary key violation error

Answer 1

+1 A:

I would use a stored procedure to catch the exceptions on your unique violations. Example:

CREATE OR REPLACE FUNCTION my_insert(i_foo text, i_bar text)
  RETURNS boolean LANGUAGE plpgsql AS
$BODY$
begin   
    insert into foo(x, y) values(i_foo, i_bar);
    exception
        when unique_violation THEN -- nothing

    return true;
end;
$BODY$;

SELECT my_insert('value 1','another value');

Frank Heikens 2010-03-03 10:32:44

Perfect, thank you.

John 2010-03-03 11:08:24

It is always better to log your exceptions.. you can modify the exception blog to log it and still continue.

Guru 2010-03-03 18:18:58

You can let the function log the exceptions, no problem.

Frank Heikens 2010-03-03 18:38:50

Answer 2

A:

Or you can use SSIS and have the failed rows take a differnt path than the successful ones.

SInce you are usinga differnt database can you bulk insert the files to a staging table and then use SQL code to select only those records which do not have an exisitng id?

HLGEM 2010-03-03 15:35:44

Can you elaborate on what you mean by SSIS?

John 2010-03-03 15:57:40

SSIS is the data import tool that comes with SQL Server. I didn't catch that you are using postgre. It can still do the job for postgre but I'm not sure how you would get it as I don't think it comes with the free version of SQL Server.

HLGEM 2010-03-03 18:02:29

Answer 3

A:

You can also use SAVEPOINTs in a transaction.

Pythonish pseudocode is illustrate from the application side:

database.execute("BEGIN")
foreach data_row in input_data_dictionary:
    database.execute("SAVEPOINT bulk_savepoint")
    try:
        database.execute("INSERT", table, data_row)
    except:
        database.execute("ROLLBACK TO SAVEPOINT bulk_savepoint")
        log_error(data_row)
        error_count = error_count + 1
if error_count > error_threshold:
    database.execute("ROLLBACK")
else:
    database.execute("COMMIT")

Edit: Here's an actual example of this in action in psql based on a slight variation of the example in the documentation (SQL statements prefixed by ">"):

> CREATE TABLE table1 (test_field INTEGER NOT NULL PRIMARY KEY);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "table1_pkey" for table "table1"
CREATE TABLE

> BEGIN;
BEGIN
> INSERT INTO table1 VALUES (1);
INSERT 0 1
> SAVEPOINT my_savepoint;
SAVEPOINT
> INSERT INTO table1 VALUES (1);
ERROR:  duplicate key value violates unique constraint "table1_pkey"
> ROLLBACK TO SAVEPOINT my_savepoint;
ROLLBACK
> INSERT INTO table1 VALUES (3);
INSERT 0 1
> COMMIT;
COMMIT
> SELECT * FROM table1;  
 test_field 
------------
          1
          3
(2 rows)

Note that the value 3 was inserted after the error, but still inside the same transaction!

The documentation for SAVEPOINT is at http://www.postgresql.org/docs/8.4/static/sql-savepoint.html.

Matthew Wood 2010-03-03 16:27:13

That's not going to work, when an error happens, the transaction is aborted and rollbacked. You need an exception handler inside the database.Query failed: ERROR: current transaction is aborted, commands ignored until end of transaction block

Frank Heikens 2010-03-03 18:38:11

Yes it will. That's the whole point of SAVEPOINTs. I've edited my answer in order to give a concrete example.

Matthew Wood 2010-03-03 19:02:02

----edit----Sorry, I was wrong... shame on me ;) It works fine, you're right.

Frank Heikens 2010-03-03 19:06:07

Thanks. However, I am adding millions of rows in a single transaction - what are the performance implications of adding millions of savepoints during a transaction (even if it is overwritten each successful row)?

John 2010-03-04 15:48:25

I'm not sure about the performance impact. If you are already running this without the savepoints, I would think it wouldn't be too difficult to add them in and test. Even if there is a performance impact per insert, you can load the data in "blocks" of records to minimize savepoints, only rolling back and handling in detail the blocks that fail.Also, the documentation states that the savepoints with the same name are not destroyed, only masked by the more recent one, so you might want to add a RELEASE SAVEPOINT <name> after the INSERT in the try block in my example.

Matthew Wood 2010-03-04 15:54:34

ansaurus

tags:

views:

answers:

Continuing a transaction after primary key violation error

related questions