ansaurus

Question

Why this code fails in PostgreSQL and how to fix it (work-around)? Is it Postgres SQL engine flaw?

Answer 1

+2 A:

NOTICE: j = 8200
ERROR: duplicate key value violates unique constraint "spb_word_word_key"
CONTEXT: SQL statement "insert into spb_word (word) select distinct word from spb_word4obj where word_id is null and doc_id = $1 "
PL/pgSQL function "spb_runme" line 18 at SQL statement

...is telling you that your spb_getWord() is generating values that already exist in the SPB_WORD table. You need to update the function to check if the word already exists before exiting the function - if it does, re-generate until it hits one that doesn't.

I think your spb_runme() needs to resemble:

create or replace function spb_runme() returns void as $$
DECLARE
  v_word VARCHAR(410);

begin
  perform setval('spb_wordnum_seq', 1, false);
  truncate table spb_word4obj, spb_word, spb_obj_word;

  for j in 0 .. 50000-1 loop

    if j % 100 = 0 then raise notice 'j = %', j; end if;

    for i in 0 .. 20 - 1 loop
      v_word := spb_getWord();
      INSERT INTO spb_word (word) VALUES (v_word);

      INSERT INTO spb_word4obj 
        (word, idx, doc_id, word_id)
        SELECT w.word, i, j, w.id
          FROM SPB_WORD w 
         WHERE w.word = v_word;

    end loop;

    INSERT INTO spb_obj_word (word_id, idx, doc_id) 
    SELECT w4o.word_id, w4o.idx, w4o.doc_id 
      FROM SPB_WORD4OBJ w4o 
     WHERE w40.doc_id = j;

  end loop;
end;

Using this would allow you to change the word_id to not support NULLs. When dealing with foreign keys, populate the table the foreign key references first - start with the parent, and then tackle its children.

The other change I made was to store the spb_getWord() in a variable (v_word), because calling the function multiple times means you'd get a different value every time.

Last thing - I removed the delete statement. You already truncated the table, there's nothing in there to delete. Certainly nothing associated to a value of j.

OMG Ponies 2010-01-19 02:03:38

please look at update to my question - I give explanations why it is not a problem with duplicates in `spb_getWord()` but rather strange error of Postgres.

WildWezyr 2010-01-19 10:04:26

I've changed my code to do it one by one with subsequent words - without buffer table `spb_word4obj` - now it looks similar to your proposition. But... it still fails in different loop iteration every time. It seems that checking for duplicate passes (no duplicate) but it is wrong and then code fails when inserting word into `spb_word` because of duplicated record.

WildWezyr 2010-01-19 15:52:07

please look at my own answer to this question it shows simplest code i was able to get for exposing this error. i'm still hitting unique key constrait and i still think it is postgres error (not my fault).

WildWezyr 2010-01-22 15:39:54

Answer 2

+1 A:

I've managed to simplify test code - now it uses one table. Simplified problem was posted on pgsql-bugs mailing list: http://archives.postgresql.org/pgsql-bugs/2010-01/msg00182.php. It is confirmed to occur on other machines (not only mine).

Here is this simplified version of main test function (it needs one table spb_word, sequences spb_wordnum_seq and spb_word_seq and one function spb_getWord given in my question).

create or replace function spb_runmeSimple2(cnt int) returns void as $$
declare
  w varchar(410);
  wordId int;
begin
  perform setval('spb_wordnum_seq', 1, false);
  truncate table spb_word cascade;

  for i in 1 .. cnt loop

    if i % 100 = 0 then raise notice 'i = %', i; end if;

    select spb_getWord() into w;
    select id into wordId from spb_word where word = w;
    if wordId is null then 
      insert into spb_word (word) values (w);
    end if;

  end loop;
end;
$$ language plpgsql;

Now error occurs (but in unpredictable manner) while executing select spb_runmeSimple2(10000000).

Here is workaround: change database collation from polish to standard 'C'. With 'C' collation there is no error. But without polish collation polish words are sorted incorrectly (with respect to polish national characters), so problem should be fixed in Postgres itself.

WildWezyr 2010-01-22 15:37:16

`Flaw`: `i` will increment even if a word already exists. The larger the number of words you want generated, the more likely the counter will increase without actually generating unique words. `Solution`: Check that the word is unique *before* leaving the word generation function.

OMG Ponies 2010-01-23 03:52:32

With the Polish collation, try: `SELECT id INTO wordId FROM SPB_WORD WHERE word LIKE w;`

OMG Ponies 2010-01-23 03:56:17

@OMG Ponies: 1) there is no flaw with loop control variable `i` - it is meant to increment with every generated word (duplicate or not) and it does correctly increment. 2) Substituting equality (`=`) with `like` helps. This is strange because like and equality are equivalent when there is not percent char (`%`) involved. But this workaround will not help when there are words generated with percent char in it. BTW: do you know why using `like` helps - what difference it makes that is important and eliminates errors?

WildWezyr 2010-01-25 09:44:33

@WildWezyr: `i` will increment if a word already exists. That means your counter will run up to your limit without producing an equivalent number of unique words if the word generation produces any duplicates. IE: loop 100 times, and get less 100 actual word records.

OMG Ponies 2010-01-25 15:39:47

I can't remember the details to why `LIKE` works vs equality, didn't know Polish had `%` as a valid character.

OMG Ponies 2010-01-25 15:43:22

@OMG Ponies: 1) "`i` will increment if a word already exists" - thats perfectly correct - i want to count iterations of the loop, not number of words inserted into spb_word. 2) imagine i want to insert two words `wódka` and `40%` - this is just a joke to show use case of percent char in words ;-).

WildWezyr 2010-01-25 16:42:27

That's cool if it is expected behavior. I see your point about allowing `%`, dunno how you'd escape the situation using `LIKE`, sorry.

OMG Ponies 2010-01-25 16:54:31

ansaurus

tags:

views:

answers:

Why this code fails in PostgreSQL and how to fix it (work-around)? Is it Postgres SQL engine flaw?

related questions