tags:

views:

34

answers:

2

Hey,

I have a table that contains sets of sequential datasets, like that:

ID set_ID some_column n
1 'set-1' 'aaaaaaaaaa' 1
2 'set-1' 'bbbbbbbbbb' 2
3 'set-1' 'cccccccccc' 3
4 'set-2' 'dddddddddd' 1
5 'set-2' 'eeeeeeeeee' 2
6 'set-3' 'ffffffffff' 2
7 'set-3' 'gggggggggg' 1

At the end of a transaction that makes several types of modifications to those rows, I would like to ensure that within a single set, all the values of "n" are still sequential (rollback otherwise). They do not need to be in the same order according to the PK, just sequential, like 1-2-3 or 3-1-2, but not like 1-3-4 or 1-2-3-3-4.

Due to the fact that there might be thousands of rows within a single set I would prefer to do it in the db to avoid the overhead of fetching the data just for verification after making some small changes.

Also there is the issue of concurrency. The way locking in InnoDB (repeatable read) works (as I understand) is that if I have an index on "n" then InnoDB also locks the "gaps" between values. If I combine set_ID and n to a single index, would that eliminate the problem of phantom rows appearing?

Looks to me like a common problem. Any brilliant ideas?

Thanks!

Note: using MySQL + InnoDB

+1  A: 

Look for sequences where max - min + 1 > count:

IF EXISTS (SELECT set_ID
             FROM mytable
            GROUP BY set_ID
           HAVING MAX(n) - MIN(n) + 1 > COUNT(n)
          )
    ROLLBACK

If the sequence must start at 1, do this instead:

IF EXISTS (SELECT set_ID
             FROM mytable
            GROUP BY set_ID
           HAVING MIN(n) = 1 AND MAX(n) > COUNT(n)
          )
    ROLLBACK

You also need to avoid duplicate sequence numbers. But this can be done by creating a unique key on set_ID and n.

Marcelo Cantos
Sorry, forgot to mention that also situations like 1-2-3-3-5 should be avoided. It must be a sequence of unique values.
LauriE
@LauriE: I've amended the question to deal with this.
Marcelo Cantos
Thanks! The unique key should be an even better approach than verifying it later
LauriE
Still - I do have a column called "OK" for soft-deleting the rows, so I can't create a unique index on set_ID and n, as there might be several soft-deleted rows with a non-unique n value. However, this was not part of the original question and I believe that structure should be preferred over "testing" if possible (unique index vs. count distinct). So I'm accepting this answer.
LauriE
the unique index will make it difficult to move rows around within the 'n' column sequence, and it does not prevent testing as it only guarantees unique values and not sequential values.
KM
I'm not sure how you're using it, but you may be able to add the `OK` column to the unique key. However, the rollback test, which is still needed in any case, may have to be cleverer to handle this.
Marcelo Cantos
A: 

Try this:

IF EXISTS (SELECT set_ID
               FROM mytable
               GROUP BY set_ID
               HAVING MIN(n) = 1 AND MAX(n) <> COUNT(DISTINCT n)
          )
    ROLLBACK

works on SQL Server (I don't have MySql to try it out):

DECLARE @YourTable table (ID int, set_ID char(5), some_column char(10),n int)
INSERT @YourTable VALUES (1, 'set-1' ,'aaaaaaaaaa' ,1)
INSERT @YourTable VALUES (2, 'set-1' ,'bbbbbbbbbb' ,2)
INSERT @YourTable VALUES (3, 'set-1' ,'cccccccccc' ,3)
INSERT @YourTable VALUES (4, 'set-2' ,'dddddddddd' ,1)
INSERT @YourTable VALUES (5, 'set-2' ,'eeeeeeeeee' ,2)
INSERT @YourTable VALUES (6, 'set-3' ,'ffffffffff' ,2)
INSERT @YourTable VALUES (7, 'set-3' ,'gggggggggg' ,1)
INSERT @YourTable VALUES (8, 'set-3' ,'ffffffffff' ,4)
INSERT @YourTable VALUES (9, 'set-3' ,'ffffffffff' ,4)

--this will list all "bad" sets
SELECT set_ID
    FROM @YourTable
    GROUP BY set_ID
    HAVING MIN(n) = 1 AND MAX(n) <> COUNT(DISTINCT n)

OUTPUT:

set_ID
------
set-3
KM
Very similar to Marcelo's solution, the COUNT DISTINCT seems to do the final trick. Perfect! Thanks,Lauri
LauriE