views:

250

answers:

4

Hi, I have the following Table definition with sample data. In the following table, Customer Product & Date are key fields

Table One
Customer   Product    Date         SALE
   X          A       01/01/2010    YES
   X          A       02/01/2010    YES
   X          A       03/01/2010    NO
   X          A       04/01/2010    NO
   X          A       05/01/2010    YES
   X          A       06/01/2010    NO
   X          A       07/01/2010    NO
   X          A       08/01/2010    NO
   X          A       09/01/2010    YES
   X          A       10/01/2010    YES
   X          A       11/01/2010    NO
   X          A       12/01/2010    YES

In the above table, I need to find the N or > N consecutive records where there was no sale, Sale value was 'NO' For example, if N is 2, the the result set would return the following

     Customer   Product    Date         SALE
       X          A       03/01/2010    NO
       X          A       04/01/2010    NO
       X          A       06/01/2010    NO
       X          A       07/01/2010    NO
       X          A       08/01/2010    NO

Can someone help me with a SQL query to get the desired results. I am using SQL Server 2005. I started playing using ROW_NUMBER() AND PARTITION clauses but no luck. Thanks for any help

A: 

You need to match your table against itself, as if there where 2 tables. So you use two aliases, o1 and o2 to refer to your table:

SELECT DISTINCT o1.customer, o1.product, o1.datum, o1.sale
  FROM one o1, one o2
  WHERE (o1.datum = o2.datum-1 OR o1.datum = o2.datum +1)
  AND o1.sale = 'NO' 
  AND o2.sale = 'NO'; 
 customer | product |   datum    | sale 
----------+---------+------------+------
 X        | A       | 2010-01-03 | NO
 X        | A       | 2010-01-04 | NO
 X        | A       | 2010-01-06 | NO
 X        | A       | 2010-01-07 | NO
 X        | A       | 2010-01-08 | NO

Note that I performed the query on an postgresql database - maybe the syntax differs on ms-sql-server, maybe at the alias 'FROM one AS o1' perhaps, and maybe you cannot add/substract in that way.

user unknown
Hi Stefan,Your suggestion works perfect. How do I handle the N consecutive case?
I fear I get that N-part of your question just right now. Hm. Tricky. You want to spezify N=16 or N=375 and get an analog result for at least 16 or 375 dates? What are the N's we're talking about? I need more room and add an additional answer.
user unknown
Hi Stefan, the N is variable. This N would be specified by the User at run time of the query. I was able to come up with a solution for this and I will be posting it soon.Thanks for all your help. I appreciate it.Javid
A: 

Ok, we need a variable answer. We search for a date, where we have N following dates, all with the sale-field being NO.

SELECT d1.datum
FROM one d1, one d2, i 
WHERE d1.sale = 'NO' AND d2.sale = 'NO'
  AND d1.datum = (d2.datum - i) 
  AND i > 0 AND i < 4 
GROUP BY d1.datum 
HAVING COUNT (*) = 3; 

This will give us the date, which we use for subquering. (Notes:

  • I used 'datum' instead of date, because date is a reserved keyword on postgresql.
  • In oracle you can use a virtual table dummy, which contains anything you ask for, like 'SELCT foo FROM dual WHERE foo in (1, 2, 3);' which will give you 1, 2, 3, if I remember correctly. Depending on the vendor, there might be other tricks to get a sequence 1 to N. I created a table i with column i, and filled it with the values 1 to 100, and I expect N not to exceed 100; Since a few versions, postgresql contains a function 'generate_series (from, to) which would solve the problem too, and might have similarities with solutions for your specific database. But table i should work vendor independent.
  • if N == 17, you have to modify 3 places from 3 to 17.

The final query will be:

SELECT o4.* 
FROM one o3, one o4 
WHERE o3.datum = (
    SELECT d1.datum
    FROM one d1, one d2, i 
    WHERE d1.sale = 'NO' AND d2.sale = 'NO'
      AND d1.datum = (d2.datum - i) 
      AND i > 0 AND i <= 3 
    GROUP BY d1.datum 
    HAVING COUNT (*) = 3) 
AND o4.datum <= o3.datum + 3 
AND o4.datum >= o3.datum; 
 customer | product |   datum    | sale 
----------+---------+------------+------
 X        | A       | 2010-02-06 | NO
 X        | A       | 2010-02-07 | NO
 X        | A       | 2010-02-08 | NO
 X        | A       | 2010-02-09 | NO
user unknown
Oh, I mention, that this will only return exact matches where N=3, not N>=3.
user unknown
A: 

A different approach, inspired by munchs last line.

Get - for a given date the first date with YES later than that, and the last date with YES earlier than that. These form the boundary, where our dates shall fit in.

SELECT (o1.datum),
    MAX (o3.datum) - MIN (o2.datum) AS diff
FROM one o1, one o2, one o3 
WHERE o1.sale = 'NO'
AND o3.datum <
    (SELECT MIN (datum) 
    FROM one 
    WHERE datum >= o1.datum 
    AND SALE = 'YES') 
AND o2.datum > 
    (SELECT MAX (datum) 
    FROM one 
    WHERE datum <= o1.datum 
    AND SALE = 'YES') 
GROUP BY o1.datum 
HAVING MAX (o3.datum) - MIN (o2.datum) >= 2
ORDER BY o1.datum;

Maybe it needs some kind of optimization, because table one is 5 times involved in the query. :)

user unknown
A: 

Thanks to everyone for posting your solution. Thought, I would also share my solution with everyone. Just as an FYI, I received this solution from another SQL Server Central forum member. I am definitely not going to take credit for this solution.

DECLARE @CNT INT
SELECT @CNT = 3

SELECT * FROM
(
  SELECT
    [Customer], [Product], [Date], [Sale], groupID, 
    COUNT(*) OVER (PARTITION BY [Customer], [Product], [Sale], groupID) AS groupCnt
  FROM
  (
    SELECT
      [Customer], [Product], [Date], [Sale],
      ROW_NUMBER() OVER (PARTITION BY [Customer], [Product] ORDER BY [Date])
      - ROW_NUMBER() OVER (PARTITION BY [Customer], [Product], [Sale] ORDER BY [Date]) AS groupID
    FROM
      [TableSales]
  ) T1
) T2
WHERE
  T2.[Sale] = 'NO' AND T2.[groupCnt] >= @CNT
This query has the potential to be very, very slow (N^3 queries if the database is unable to optimize it in any way)
Renesis