views:

775

answers:

6

Having a table with the following fields:

Order,Group,Sequence

it is required that all orders in a given group form a continuous sequence. For example: 1,2,3,4 or 4,5,6,7. How can I check using a single SQL query what orders do not comply with this rule? Thank you.

Example data:

Order   Group Sequence
1   1 3
2   1 4
3   1 5
4   1 6
5   2 3
6   2 4
7   2 6

Expected result:
Order
5
6
7

Also accepted if the query returns only the group which has the wrong sequence, 2 for the example data.

+2  A: 

How about this?

select Group from Table
group by Group
having count(Sequence) <= max(Sequence)-min(Sequence)

[Edit] This assumes that Sequence does not allow duplicates within a particular group. It might be better to use:
count != max - min + 1

[Edit again] D'oh, still not perfect. Another query to flush out duplicates would take care of that though.

[Edit the last] The original query worked fine in sqlite, which is what I had available for a quick test. It is much more forgiving than SQL server. Thanks to Bell for the pointer.

JimG
what about duplicates ?
Sam Saffron
I would expect this to give and error to the effect of 'Column "Sequence" must be part of a GROUP BY or an aggregate function'.Which of the values of sequence for the group that's got the problem is it supposed to return?
Bell
+1  A: 

This SQL selects the orders 3 and 4 wich have none continuous sequences.

DECLARE @Orders TABLE ([Order] INTEGER, [Group] INTEGER, Sequence INTEGER)

INSERT INTO @Orders VALUES (1, 1, 0)
INSERT INTO @Orders VALUES (1, 2, 0)
INSERT INTO @Orders VALUES (1, 3, 0)

INSERT INTO @Orders VALUES (2, 4, 0)
INSERT INTO @Orders VALUES (2, 5, 0)
INSERT INTO @Orders VALUES (2, 6, 0)

INSERT INTO @Orders VALUES (3, 4, 0)
INSERT INTO @Orders VALUES (3, 6, 0)

INSERT INTO @Orders VALUES (4, 1, 0)
INSERT INTO @Orders VALUES (4, 2, 0)
INSERT INTO @Orders VALUES (4, 8, 0)

SELECT o1.[Order]
FROM @Orders o1
     LEFT OUTER JOIN @Orders o2 ON o2.[Order] = o1.[Order] AND o2.[Group] = o1.[Group] + 1
WHERE o2.[Order] IS NULL
GROUP BY o1.[Order]
HAVING COUNT(*) > 1
Lieven
+4  A: 
Bell
Nice one, missing the = in the where though
Andomar
Thanks, escaping can be a pain.
Bell
A: 

So your table is in the form of

Order Group Sequence
1     1     4
1     1     5
1     1     7

..and you want to find out that 1,1,6 is missing?

With

select
  min(Sequence) MinSequence, 
  max(Seqence) MaxSequence 
from 
  Orders 
group by 
  [Order], 
  [Group]

you can find out the bounds for a given Order and Group.

Now you can simulate the correct data by using a special numbers table, which just contains every single number you could ever use for a sequence. Here is a good example of such a numbers table. It's not important how you create it, you could also create an excel file with all the numbers from x to y and import that excel sheet.

In my example I assume such a numbers table called "Numbers" with only one column "n":

select 
  [Order], 
  [Group], 
  n Sequence
from
  (select min(Sequence) MinSequence, max(Seqence) MaxSequence from [Table] group by [Order], [Group]) MinMaxSequence
  left join Numbers on n >= MinSequence and n <= MaxSequence

Put that SQL into a new view. In my example I will call the view "vwCorrectOrders".

This gives you the data where the sequences are continuous. Now you can join that data with the original data to find out which sequences are missing:

select 
  correctOrders.*
from
  vwCorrectOrders co 
  left join Orders o on 
      co.[Order] = o.[Order] 
  and co.[Group] = o.[Group]
  and co.Sequence = o.Sequence
where
  o.Sequence is null

Should give you

Order Group Sequence
1     1     6
VVS
A: 

After a while I came up with the following solution. It seems to work but it is highly inefficient. Please add any improvement suggestions.

SELECT OrdMain.Order
  FROM ((Orders AS OrdMain
  LEFT OUTER JOIN Orders AS OrdPrev ON (OrdPrev.Group = OrdMain.Group) AND (OrdPrev.Sequence = OrdMain.Sequence - 1))
  LEFT OUTER JOIN Orders AS OrdNext ON (OrdNext.Group = OrdMain.Group) AND (OrdNext.Sequence = OrdMain.Sequence + 1))
WHERE ((OrdMain.Sequence < (SELECT MAX(Sequence) FROM Orders OrdMax WHERE (OrdMax.Group = OrdMain.Group))) AND (OrdNext.Order IS NULL)) OR
      ((OrdMain.Sequence > (SELECT MIN(Sequence) FROM Orders OrdMin WHERE (OrdMin.Group = OrdMain.Group))) AND (OrdPrev.Order IS NULL))
Tihauan
Maybe read the other answers before you post ;)
Andomar
It's a common misconception that subqueries are evaluated once for each row. Any DBMS worth its salt implements a typical subquery using special joins or with specific optimizations. Subqueries are a central concept in SQL, and most vendors have invested a lot of time in improving their performance, based upon volumes of academic research on the topic. To be fair, I can't speak for the JET engine in particular, but I understand that it has undergone quite a bit of work in the past few years. That's not to say I think your answer is less efficient -- just that subqueries ain't so bad.
WCWedin
+1  A: 

Personaly I think I would consider rethinking the requirement. It is the nature of relational databases that gaps in sequences can easily occur due to records that are rolled back. For instance, supppose an order starts to create four items in it, but one fails for some rason and is rolled back. If you precomputed the sequences manually, you would then have a gap is the one rolled back is not the last one. In other scenarios, you might get a gap due to multiple users looking for sequence values at approximately the same time or if at the last minute a customer deleted one record from the order. What are you honestly looking to gain from having contiguous sequences that you don't get from a parent child relationship?

HLGEM