tags:

views:

1663

answers:

6

Quick question for the SQL gurus.

I have a table which contains, amongst others, two columns - min_number and max_number I have been trying unsuccessfully to write a query which finds the first hole of n size between the min and max numbers

Example

     min    max
1.   100    200
2.   250    300
3.   330    400

If I want to find a hole of size 50, row 1's max of 200 would be returned (there is a hole of 50 between that and row 2's min), a hole of 20 would return row 2s max of 300 etc. If no suitable sized hole existed the last max (400) would be returned.

Thanks

A: 

Personally, I wouldn't try to do that in SQL - AIUI it's difficult to perform analysis across different rows without effectively having to scan the table in O(n^2) in the worst case. That would possibly be easier using a stored procedure, though.

My solution, if you're able, would be to change your database schema and code so that each time a new row is inserted, the previous row is updated with the difference between the max of that row and the min of the new row, with that difference value stored in its own column.

Searching for the first row that has a gap large enough then becomes relatively trivial.

Alnitak
Yes, I was coming aound to that way of thinking, but it is row deletions that are causing the holes.
so, any reason you couldn't update the difference value each time you do a delete? You'd need to consider both the previous and following rows to do it, though.
Alnitak
A: 

Have MySQL model clause? If yes you can do it with a query with it.

FerranB
A: 

Edited: final answer is at the bottom.

Why do so many SQL questions forget the table name?

-- Buggy: should reference (lo.max + 1)
SELECT lo.max + 1 AS min_range
    FROM example lo, example hi
    WHERE hi.min - (lo.max - 1) >= 40   -- Example won't work with 50
      AND NOT EXISTS (SELECT * FROM example AS mid
                         WHERE mid.min > lo.max
                           AND mid.max < hi.min
                     )

The NOT EXISTS clause is crucial - it ensures that you are only considering adjacent ranges.

This deals with the 'there is a gap big enough' case.

Nominally, you can deal with the 'there is no gap big enough' with a UNION clause:

...
UNION
SELECT MAX(max)+1
    FROM example
    WHERE NOT EXISTS(
        SELECT lo.max + 1 AS min_range
            FROM example lo, example hi
            WHERE hi.min - (lo.max - 1) >= 40   -- Example won't work with 50
              AND NOT EXISTS (SELECT * FROM example AS mid
                                 WHERE mid.min > lo.max
                                   AND mid.max < hi.min
                             )
            )

The inner SELECT is a direct transcription of the first, indented.


The SQL above was untested. The first part works (especially on the test data) - but can produce multiple answers. So, it needs to be revised to (fixing, I think, an off-by-two error):

SELECT MIN(lo.max + 1) AS min_range
    FROM example lo, example hi
    WHERE hi.min - (lo.max + 1) >= 40   -- Example won't work with 50
      AND NOT EXISTS (SELECT * FROM example AS mid
                         WHERE mid.min > lo.max
                           AND mid.max < hi.min
                     )

The UNION clause is giving me some grief...not producing the answer I expect.

Syntactically, I had to amend it to:

SELECT MIN(lo.max + 1) AS min_range
    FROM example lo, example hi
    WHERE hi.min - (lo.max + 1) >= 40   -- Example won't work with 50
      AND NOT EXISTS (SELECT * FROM example AS mid
                         WHERE mid.min > lo.max
                           AND mid.max < hi.min
                     )
UNION
SELECT MAX(solo.max)+1
    FROM example AS solo
    WHERE NOT EXISTS(
        SELECT MIN(lo.max + 1) AS min_range
            FROM example lo, example hi
            WHERE hi.min - (lo.max - 1) >= 40   -- Example won't work with 50
              AND NOT EXISTS (SELECT * FROM example AS mid
                                 WHERE mid.min > lo.max
                                   AND mid.max < hi.min
                             )
            )

This circumvents problems with the keyword MAX being used as a column name (I could probably have written example.max instead of solo.max. But it isn't producing me the answer I expect.


A UNION is equivalent to an OR, certainly in this case, and this query seems to produce the answer I want:

SELECT MIN(lo.max + 1) AS min_range
    FROM example lo, example hi
    WHERE (hi.min - (lo.max + 1) >= 40
           AND NOT EXISTS (SELECT * FROM example AS mid
                              WHERE mid.min > lo.max
                                AND mid.max < hi.min
                          )
          )
       OR lo.max = (SELECT MAX(solo.max) FROM Example AS Solo)
;

It is crucial that the OR clause cite lo.max and not hi.max; otherwise, you get the wrong answer.


OK - the UNION version is doomed, because SQL misdefines the behaviour of MIN. Specifically, if there are no rows that match, then MIN returns a single row with the value NULL, rather than returning no rows. That means that the first clause of the UNION returns a NULL when there are no rows found; the second clause can be 'fixed' by omitting the MIN from the SELECT inside the NOT EXISTS, but you still end up with two rows (a NULL and the correct value) from the statement, which is not really acceptable. So, the OR version is the one to use - and SQL bites again with NULL values.

Rigorously avoiding nulls can be done by framing the UNION in a table expression in the FROM clause. This ends up being slightly simpler:

SELECT MIN(min_range)
    FROM (SELECT (lo.max + 1) AS min_range
              FROM example lo, example hi
              WHERE hi.min - (lo.max + 1) >= 49
                AND NOT EXISTS (SELECT * FROM example AS mid
                                   WHERE mid.min > lo.max
                                     AND mid.max < hi.min
                               )
          UNION
          SELECT MAX(solo.max + 1) AS min_range
              FROM example AS solo
         );

The first half of the UNION can return any number of slots including zero; the second always returns a value (as long as there are any rows in the table at all). The outer query then chooses the lowest of these values.

This version can, of course, be used to allocate rows:

INSERT INTO Example(min, max)
    SELECT MIN(min_range) AS min, MIN(min_range) + (50 - 1) AS max
        FROM (SELECT (lo.max + 1) AS min_range
                  FROM example lo, example hi
                  WHERE hi.min - (lo.max + 1) >= 50
                    AND NOT EXISTS (SELECT * FROM example mid
                                       WHERE mid.min > lo.max
                                         AND mid.max < hi.min
                                   )
              UNION
              SELECT MAX(solo.max + 1) AS min_range
                  FROM example AS solo
             );
Jonathan Leffler
and this is why I recommended not doing it with SQL ;-)
Alnitak
OK - SQL isn't to everyone's liking. There can be many sound reasons for doing it in SQL - and it is perfectly feasible to do so.
Jonathan Leffler
A: 

"a hole of 20 would return row 2s max of 300 etc" I'm not following your logic there - the gap between the maximum of row 2 (300) and minimum of row 3 (330) is 30 (if you are including either the min or max values, 29 if not). Does this mean you are looking for the first gap of "greater than or equal to" the specified value, or does the gap need to be an exact match? If it is "greater than or equal to" then surely the first match returned would be row 1, which has a gap > 20 between it and row 2?

At any rate if your table has a row ID of some sort, as the example seems to indicate, then you could try something like this (assuming a table MyTable with columns RowID, MinVal and MaxVal populated with the data in your example):

SELECT TOP 1
        a.RowID,
        a.MinVal,
        a.MaxVal, -- this is the value you want to return
        ISNULL(b.MinVal, 9999) AS MinVal_NextRow,
        ISNULL(b.MinVal, 9999) - a.MaxVal AS Diff
FROM    MyTable a
        LEFT JOIN MyTable b ON a.RowID = ( b.RowID - 1 )
WHERE   ( ISNULL(b.MinVal, 9999) - a.MaxVal ) = 20

This example selects the first row where the gap is exactly 20. If you were looking for the first gap of at least 20, then you could change the WHERE clause to :

WHERE   ( ISNULL(b.MinVal, 9999) - a.MaxVal ) >= 20

The query substitutes an arbitrarily large number (9999) in for when the row is the last one available - this is what returns the last (largest) MaxVal if there are no gaps of a suitable size. You would need to adjust this number to something that made sense for your data (i.e. larger than any possible values in the data).

Nathan
The ROWID values are pretty much guaranteed to be out of sequence after a little while. Depending on ROWID - 1 is going to yield bogus results.
Jonathan Leffler
What is it that makes you think that? There is nothing in the question to even indicate there is even a RowId column, hence this solution applies only "if" the table has a RowId column. This is a viable solution based on the info provided, but obviously has some depenencies.
Nathan
+1  A: 
SELECT
     MIN(T1.max_value)
FROM
     My_Table T1
LEFT OUTER JOIN My_Table T2 ON
     T2.min_value BETWEEN (T1.max_value + 1) AND (T1.max_value + @range)
WHERE
     T2.id IS NULL

I'm guessing that since you are looking for IDs to assign, that you want a range of values completely exclusive of the max_value and min_value.

You can also do the above query with a NOT EXISTS clause. Try it with both and see which performs better for you.

Another thing to consider is, do you really need to reuse IDs? Are your ID values going to get so high and your range available so low that you will need to do that? I don't know the specifics of your system, but it just seems like you might be spending a lot of effort and then using a lot of extra processing to solve a problem that doesn't really exist.

Tom H.
A: 

select min(n+1) from myTable where n+1 NOT IN (select n from myTable)

  • R Doherty
R Doherty