views:

3554

answers:

5

Several months ago I learnt from here how to perform multiple updates at once in MySQL using the following syntax

INSERT INTO table (id, field, field2) VALUES (1, A, X), (2, B, Y), (3, C, Z)
ON DUPLICATE KEY UPDATE field=VALUES(Col1), field2=VALUES(Col2);

I've now switched over to PostgreSQL and apparently this is not correct. It's referring to all the correct tables so I assume it's a matter of different keywords being used but I'm not sure where in the PostgreSQL documentation this is covered.

To clarify, I want to insert several things and if they already exist to update them.

+1  A: 

According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.

Christian Hang
+2  A: 

There is no simple command to do it.

The most correct approach is to use function, like the one from docs.

Another solution (although not that safe) is to do update with returning, check which rows were updates, and insert the rest of them

Something along the lines of:

update table
set column = x.column
from (values (1,'aa'),(2,'bb'),(3,'cc')) as x (id, column)
where table.id = x.id
returning id;

assuming id:2 was returned:

insert into table (id, column) values (1, 'aa'), (3, 'cc');

Of course it will bail out sooner or later (in concurrent environment), as there is clear race condition in here, but usually it will work.

depesz
Surely locking the table would delay any concurrent queries but allow this one to run uninterrupted?
Teifion
+5  A: 

Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:

Example 38-2. Exceptions with UPDATE/INSERT

This example uses exception handling to perform either UPDATE or INSERT, as appropriate:

CREATE TABLE db (a INT PRIMARY KEY, b TEXT);

CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
    LOOP
        -- first try to update the key
        UPDATE db SET b = data WHERE a = key;
        IF found THEN
            RETURN;
        END IF;
        -- not there, so try to insert the key
        -- if someone else inserts the same key concurrently,
        -- we could get a unique-key failure
        BEGIN
            INSERT INTO db(a,b) VALUES (key, data);
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- do nothing, and loop to try the UPDATE again
        END;
    END LOOP;
END;
$$
LANGUAGE plpgsql;

SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');
Stephen Denne
"Upsert"? Is that a typo or a name/process I've never heard of?
Teifion
http://en.wikipedia.org/wiki/Upsert
Stephen Denne
+1  A: 

For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I'd suggest looking into http://mbk.projects.postgresql.org

[Ya, "self-promo"]

The current best practice that I'm aware of is:

  1. COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
  2. Acquire Lock [optional] (advisory is preferable to table locks, IMO)
  3. Merge. (the fun part)
  4. Be Happy. Or content, if happy aint yo thang.
jwp
A: 

i was looking for the same thing when i came here, but the lack of a generic "upsert" function botherd me a bit so i thought you could just pass the update and insert sql as arguments on that function form the manual

that would look like this:

CREATE FUNCTION upsert (sql_update TEXT, sql_insert TEXT)
    RETURNS VOID
    LANGUAGE plpgsql
AS $$
BEGIN
    LOOP
        -- first try to update
        EXECUTE sql_update;
        -- check if the row is found
        IF FOUND THEN
            RETURN;
        END IF;
        -- not found so insert the row
        BEGIN
            EXECUTE sql_insert;
            RETURN;
            EXCEPTION WHEN unique_violation THEN
                -- do nothing and loop
        END;
    END LOOP;
END;
$$;

and perhaps to do what you initially wanted to do, batch "upsert", you could use Tcl to split the sql_update and loop the individual updates, the preformance hit will be very small see http://archives.postgresql.org/pgsql-performance/2006-04/msg00557.php

the highest cost is executing the query from your code, on the database side the execution cost is much smaller

Paul Scheltema