tags:

views:

126

answers:

3

In a PostgreSQL database I have a table with a primary key and another field which needs to be unique.

  CREATE TABLE users (
    id      INTEGER PRIMARY KEY DEFAULT nextval('groups_id_seq'::regclass),
    name    VARCHAR(255) UNIQUE NOT NULL
  );

  INSERT users (name) VALUES ('foo');
  INSERT users (name) VALUES ('foo');
  INSERT users (name) VALUES ('bar');

The second insert fails but the sequence groups_id_seq is already incremented so when 'bar' is added it leaves a gap in the id numbers.

Is there a way to tell PostgreSQL to fetch the next value only if other constraints are met or should I check first using SELECT if the name is not duplicate? This still would not guarantee the lack of gaps but at least it would reduce their number to the rare cases when there is another process trying to insert the same name at the same time

+8  A: 

I don't think so: a basic feature of sequences is that gaps are possible (think of two concurrent transactions, with one performing a ROLLBACK). You should ignore gaps. Why are they a problem in your case?

bortzmeyer
Not really a problem, I was just surprised to see it and if it can be mostly avoided then why not invest the energy to learn how? I know I cannot avoid it 100%.
szabgab
No, they cannot be avoided. There are other cases that can cause quite sizable gaps in sequences as well - just design your application around the fact that sequences are guaranteed to keep incrementing (until you hit the max for your datatype, 32 or 64 bit), but not guaranteed to be gapless.
Magnus Hagander
@Magnus: They can be avoided, but at great cost in terms of complexity (see my answer). I totally agree that it's best to design for allowing gaps if you possibly can.
j_random_hacker
+4  A: 

If you need gapless sequences - there are ways to do it, but it's not trivial, and definitely much slower.

Also - if you worry about "using too many ids" - just define id as bigserial.

depesz
+3  A: 

It is possible, though cumbersome, to do this. As bortzmeyer says, it's dangerous to rely on values from sequences being contiguous, so it's best to just leave things as they are if you can.

If you can't:

Every access to the table that could cause a row to have a certain name (that is, every INSERT to that table, and if you allow it (though it's poor practice) every UPDATE that could change the name field) must do so inside a transaction that locks soemthing first. The simplest and least performant option is to simply lock the entire table using LOCK users IN EXCLUSIVE MODE (adding the last 3 words permits concurrent read access by other processes, which is safe).

However that is a very coarse lock that will slow performance if there are many concurrent modifications to users; a better option would be locking a single, corresponding row in another table that must already exist. This row can be locked with SELECT ... FOR UPDATE. This makes sense only when working with a "child" table that has a FK dependency on another "parent" table.

For example, imagine for the time being that we are actually trying to safely create new orders for a customer, and that these orders somehow have identifying 'names'. (I know, poor example...) orders has a FK dependency on customers. Then to prevent ever creating two orders with the same name for a given customer, you could do the following:

BEGIN;

-- Customer 'jbloggs' must exist for this to work.  
SELECT 1 FROM customers
WHERE id = 'jbloggs'
FOR UPDATE

-- Provided every attempt to create an order performs the above step first,
-- at this point, we will have exclusive access to all orders for jbloggs.
SELECT 1 FROM orders
WHERE id = 'jbloggs'
AND order_name = 'foo'

-- Determine if the preceding query returned a row or not.
-- If it did not:
INSERT orders (id, name) VALUES ('jbloggs', 'foo');

-- Regardless, end the transaction:
END;

Note that it is not sufficient to simply lock the corresponding row in users with SELECT ... FOR UPDATE -- if the row does not already exist, several concurrent processes may simultaneously report that the row does not exist, and then attempt simultaneous insertions, resulting in failed transactions and thus sequence gaps.

Either locking scheme will work; what's important is that anyone trying to create a row with the same name must attempt to lock the same object.

j_random_hacker