views:

598

answers:

10

Let's say I have the following table with three columns:

id | start_block | end_block
-----------------------------
01 | 00000000001 | 00000000005
02 | 00000000006 | 00000000011
03 | 00000000012 | 00000000018
04 | 00000000025 | 00000000031
05 | 00000000032 | 00000000043

Each row was a "Start Block" and an "End Block". If the data was perfect, each start block would be one more than the end block before it. So, for row id == 02, the start block is 6 while the end block for the row before it 5.

I need to query this data (it's tens of thousands of row) and find any missing rows. According to my sample data, there should be a row in between 03 and 04 that has a start block of 19 and an end block of 24.

I'm trying to build a report in JSP to reconcile this data and find the missing rows. The ugly way to do this would be to pull the whole recordset into an array and do something like on every row:

if ((arry(i,1) + 1) != (arry(i+1),1)( {
  print("Bad Row!\n");
}

But, I would really like to be able to query the recordset directly and return what I need. Is that possible? If not, could someone point me in the right direction of creating a stored proc that does what I need?

+1  A: 

Well, you don't really need to put the whole thing in an array. All you have to do is compare the current row with the one preceding it.

However, I will give it some thought and see if there is a SQL solution.

Robert Harvey
A: 
 Select * From Table O
   Where 
      (Exists
         (Select * From Table
          Where End_Block < O.Start_Block)
       And Not Exists 
         (Select * From Table
          Where End_Block = O.Start_Block - 1)) 
    Or
      (Exists
         (Select * From Table
          Where Start_Block > O.End_Block)
       And Not Exists 
         (Select * From Table
          Where Start_Block = O.End_Block + 1 ))
Charles Bretana
+9  A: 

Sure wouldn't hurt to give it a try

CREATE TABLE #t (startz INT, zend INT)
insert into #t (startz, zend) values (1,5)
insert into #t (startz, zend) values (6,11)
insert into #t (startz, zend) values (12,18)
insert into #t (startz, zend) values (25,31)
insert into #t (startz, zend) values (32,43)

select * from #t ta
LEFT OUTER JOIN #t tb ON tb.startz - 1 = ta.zend
WHERE tb.startz IS NULL

The last result is a false positive. But easy to eliminate.

hova
You're suggesting that he write this for "tens of thousands of rows"?
DOK
3 lines for tens of thousands of rows? Why not? This is an example, the author of the problem will find his own way to get his data imported.
hova
Agreed, just 3 lines. I'm not sure why you would have to write this more than once for "tens of thousands of rows".
s_hewitt
A: 
select e1.end_block + 1 as start_hole,
    (select min(start_block) 
     from extent e3 
     where e3.start_block > e1.end_block) - 1 as end_hole
from extent e1 
left join extent e2 on e2.start_block = e1.end_block + 1
where e2.start_block is null 
and e1.end_block <> (select max(end_block) from extent);

Although I'd say this is a reasonable candidate for iterating through the result in TSQL: you're going to have to scan the entire table (or at least the entirety of indices on start_block and end_block) anyway, so looping through just once and using variables to remember the last value is something to aim for.

araqnid
+2  A: 

You could try:

SELECT t.ID, t.Start_Block, t.End_Block
FROM [TableName] t
JOIN [TableName] t2 ON t.ID = t2.ID+1
WHERE t.Start_Block - t2.End_Block > 1
Pulsehead
+1 fist answer of this type and it'll work fast if the id's are consecutive
Andomar
+1  A: 

This will do it. You might also want to look for overlapping blocks.

SELECT
     T1.end_block + 1 AS start_block,
     T2.start_block - 1 AS end_block
FROM
     dbo.My_Table T1
INNER JOIN dbo.My_Table T2 ON
     T2.start_block > T1.end_block
LEFT OUTER JOIN dbo.My_Table T3 ON
     T3.start_block > T1.end_block AND
     T3.start_block < T2.start_block
WHERE
     T3.id IS NULL AND
     T2.start_block <> T1.end_block + 1
Tom H.
A: 

Here is a SQL that actually tells you the missing rows!

Did it quite fast so ignore performance issues:

Based on:

CREATE TABLE #t (startz INT, zend INT)
insert into #t (startz, zend) values (1,5)
insert into #t (startz, zend) values (6,11)
insert into #t (startz, zend) values (12,18)
insert into #t (startz, zend) values (25,31)
insert into #t (startz, zend) values (32,43)
insert into #t (startz, zend) values (45,58)
insert into #t (startz, zend) values (60,64)
insert into #t (startz, zend) values (70,98)


select tab1.zend+1 as MissingStartValue,
       (select min(startz-1) from #t where startz > tab1.zend+1) as MissingEndValue
 from #t as tab1 where not exists (select 1 from #t as tab2 where tab1.zend + 1 = tab2.startz)
and (select min(startz-1) from #t where startz > tab1.zend+1) is not null
You're suggesting that he write this for "tens of thousands of rows"?
DOK
A: 
select * from blocks a
where not exists (select * from blocks b where b.start_block = a.end_block + 1)

would give you the blocks immediately preceding a gap. You could get fancy. Let's see...

select a.end_block, min(b.start_block)
from blocks a,
     blocks b
where not exists (select * from blocks c where c.start_block = a.end_block + 1)
and b.start_block > a.end_block
group by a.end_block

I think that ought to do it.

Carl Manaster
A: 
SELECT t1.End_Block + 1 as Start_Block,
       t2.Start_Block - 1 as End_Block,
  FROM Table as t1, Table as t2
 WHERE t1.ID + 1 = t2.ID
   AND t1.End_Block + 1 <> T2.Start_Block

This assumes the IDs in the table are sequential. If they are not sequential then you have to do some complicated linking with Start_Block to End_Block to link the two blocks adjacent to each other.

Paul Morgan
A: 

Select id-1 from table where id-1 not in (select id from table)

This is the simplest solution.