ansaurus

Question

Insert, on duplicate update (postgresql)

Answer 1

+1 A:

According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.

Christian Hang 2009-07-10 11:49:16

Answer 2

+2 A:

There is no simple command to do it.

The most correct approach is to use function, like the one from docs.

Another solution (although not that safe) is to do update with returning, check which rows were updates, and insert the rest of them

Something along the lines of:

update table
set column = x.column
from (values (1,'aa'),(2,'bb'),(3,'cc')) as x (id, column)
where table.id = x.id
returning id;

assuming id:2 was returned:

insert into table (id, column) values (1, 'aa'), (3, 'cc');

Of course it will bail out sooner or later (in concurrent environment), as there is clear race condition in here, but usually it will work.

depesz 2009-07-10 12:04:39

Surely locking the table would delay any concurrent queries but allow this one to run uninterrupted?

Teifion 2009-07-10 12:10:22

Answer 3

+5 A:

Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:

Example 38-2. Exceptions with UPDATE/INSERT

This example uses exception handling to perform either UPDATE or INSERT, as appropriate:

CREATE TABLE db (a INT PRIMARY KEY, b TEXT);

CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
    LOOP
        -- first try to update the key
        UPDATE db SET b = data WHERE a = key;
        IF found THEN
            RETURN;
        END IF;
        -- not there, so try to insert the key
        -- if someone else inserts the same key concurrently,
        -- we could get a unique-key failure
        BEGIN
            INSERT INTO db(a,b) VALUES (key, data);
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- do nothing, and loop to try the UPDATE again
        END;
    END LOOP;
END;
$$
LANGUAGE plpgsql;

SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');

Stephen Denne 2009-07-10 12:18:16

"Upsert"? Is that a typo or a name/process I've never heard of?

Teifion 2009-07-10 13:00:41

http://en.wikipedia.org/wiki/Upsert

Stephen Denne 2009-07-10 13:08:22

Answer 4

+1 A:

For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I'd suggest looking into http://mbk.projects.postgresql.org

[Ya, "self-promo"]

The current best practice that I'm aware of is:

COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
Acquire Lock [optional] (advisory is preferable to table locks, IMO)
Merge. (the fun part)
Be Happy. Or content, if happy aint yo thang.

jwp 2009-07-10 22:57:55

Answer 5

A:

i was looking for the same thing when i came here, but the lack of a generic "upsert" function botherd me a bit so i thought you could just pass the update and insert sql as arguments on that function form the manual

that would look like this:

CREATE FUNCTION upsert (sql_update TEXT, sql_insert TEXT)
    RETURNS VOID
    LANGUAGE plpgsql
AS $$
BEGIN
    LOOP
        -- first try to update
        EXECUTE sql_update;
        -- check if the row is found
        IF FOUND THEN
            RETURN;
        END IF;
        -- not found so insert the row
        BEGIN
            EXECUTE sql_insert;
            RETURN;
            EXCEPTION WHEN unique_violation THEN
                -- do nothing and loop
        END;
    END LOOP;
END;
$$;

and perhaps to do what you initially wanted to do, batch "upsert", you could use Tcl to split the sql_update and loop the individual updates, the preformance hit will be very small see http://archives.postgresql.org/pgsql-performance/2006-04/msg00557.php

the highest cost is executing the query from your code, on the database side the execution cost is much smaller

Paul Scheltema 2010-09-16 16:13:10

ansaurus

tags:

views:

answers:

Insert, on duplicate update (postgresql)

related questions