views:

88

answers:

2

I have a tricky problem that has had me scratching my head for a little while. I have some data that represents the delivery of "widgets" over a variable number of days, broken down into half hourly slots.

For example (apologies for formatting - haven't quite got to grips with it):

Date          Time    NoOfUnits  
01-Mar-2010   00:00   0  
01-Mar-2010   00:30   0  
01-Mar-2010   01:00   0  
.... (following half hour intervals have NoOfUnits = 0)
01-Mar-2010   23:00   10  
01-Mar-2010   23:30   10
02-Mar-2010   00:00   10
.... (following half hour intervals have NoOfUnits = 1)
02-Mar-2010   07:00   10
02-Mar-2010   07:30   0
.... (following half hour intervals have NoOfUnits = 0)
02-Mar-2010   23:30   0

I need to generate a query that will allow me to group this data into all the distinct blocks where I am delivering a unit. In the above example I need to identify only 1 block - 23:00 to 07:00, and the sum of units for that block (160). The required result would therefore be StartTime, EndTime, TotalNoOfUnits.

However, the complexity comes when we have different patterns of delivery - maybe we have a day where we deliver units for 24 hours.

I need to be able to query data in the format above and identify all the unique StartTime, EndTime and TotalNoOfUnits combinations where NoOfUnits <> 0.

Apologies again for formatting and a slightly rambling question. Please ask any questions you need for me to clarify things.

EDIT: Just to be clear, the data will always run from 00:00 to 23:30 for each day of delivery, and each half hour slot will always be present. It is only the number of days and unit per half hour slot that may vary for any given data set.

EDIT2: Below is a script that will populate a table with 2 days worth of schedule data. The schedule is the same for both days. The result I would expect to see, based on my requirements, would be 13:00, 00:00, 230. As you will see from the below my SQL skills are not great hence the head scratching!

declare @DayCount int
declare @HalfHourCount int
declare @HH int
declare @CurrentDate datetime
declare @BaseDate datetime
declare @NoOfUnits int

set @HalfHourCount = 48
set @DayCount = 4
set @BaseDate = '1 Jan 1900'

create table #Schedule
(
Date datetime
, Time datetime
, NoOfUnits int
)

while @DayCount > 0
begin
set @CurrentDate = dateadd(dd, @DayCount * -1, CONVERT(VARCHAR(10),GETDATE(),111))
set @HH = @HalfHourCount

while @HH > 0
begin
    if @HH > 24
        set @NoOfUnits = 10
    else
    begin
        if @DayCount = 4 and @HH < 10
            set @NoOfUnits = 10
        else
            set @NoOfUnits = 0
    end

    insert into #Schedule(Date, Time, NoOfUnits)
    values (@CurrentDate, dateadd(mi, @HH / 2.0 * 60, @BaseDate), @NoOfUnits)

    select @HH = @HH - 1
end

set @DayCount = @DayCount - 1
end

Expected result (although test data should start from 00:00 and go to 23:00 instead of 00:30 to 00:00):

StartTime               TotalUnits  EndDate
----------------------- ----------- -----------------------
1900-01-01 00:30:00.000 90          1900-01-01 04:30:00.000
1900-01-01 12:30:00.000 960         1900-01-02 00:00:00.000
+1  A: 

If I understand you correctly, have a look at this example. Please let me know if you have more questions.

DECLARE @Table TABLE(
        DateVal DATETIME,
        NoOfUnits INT
)

INSERT INTO @Table SELECT '01 Mar 2010 00:30', 0

INSERT INTO @Table SELECT '01 Mar 2010 23:00', 10
INSERT INTO @Table SELECT '01 Mar 2010 23:30', 10
INSERT INTO @Table SELECT '02 Mar 2010 00:30', 10

INSERT INTO @Table SELECT '02 Mar 2010 01:00', 0

INSERT INTO @Table SELECT '02 Mar 2010 02:30', 20
INSERT INTO @Table SELECT '02 Mar 2010 03:30', 30
INSERT INTO @Table SELECT '02 Mar 2010 04:30', 40

INSERT INTO @Table SELECT '02 Mar 2010 05:00', 0

SELECT  *
FROM    @Table

;WITH DateValues AS(
        SELECT  t.DateVal,
                t.NoOfUnits,
                MIN(tNext.DateVal) MinDate
        FROM    @Table t LEFT JOIN
                @Table tNext    ON  t.DateVal < tNext.DateVal
                                AND tNext.NoOfUnits = 0
        WHERE   t.NoOfUnits != 0
        GROUP BY t.DateVal,
                t.NoOfUnits
)
, DateRanges AS(
        SELECT  DateVal StartDate,
                NoOfUnits,

            ISNULL((SELECT MAX(DateVal) FROM @Table WHERE DateVal < MinDate), (SELECT MAX(DateVal) FROM @Table)) EndDateEndDate
        FROM    DateValues
)
SELECT  MIN(StartDate) StartDate,
        SUM(NoOfUnits) TotalUnits,
        EndDate
FROM    DateRanges
GROUP BY EndDate

Ouput

StartDate               TotalUnits  EndDate
----------------------- ----------- -----------------------
2010-03-01 23:00:00.000 30          2010-03-02 00:30:00.000
2010-03-02 02:30:00.000 90          2010-03-02 04:30:00.000
astander
Thanks astander. I ran your script against the test data I have generated (script has been added to the original question) and it gave me the result I was expecting. Much appreciated. I will test the code against some other test data, i.e. where I have schedules that differ by day. It is some comfort to know I wasn't missing anything obvious (at least to me) in the answer - I could have spent weeks on this and not got anywhere near your solution!
Daniel Kelley
I have just added data for a third day to my test data, where each HH slot has a unit count of 20, and now the results are incorrect - I get a row for every half hour in the new day, plus the sum of all units between 12:30 and 02:00.
Daniel Kelley
Can you show us this new test data, so I can check it? Also, do you wish this grouped by day, or per consecutive entries with values?
astander
I need to see the data grouped into unique combinatiosn of start/end time (regardless of dat) and total quantity for the start/end time. I will provide sample data shortly (have to dash right now).
Daniel Kelley
Original question has been updated to change data creation script and include expected results.
Daniel Kelley
+3  A: 

This should work, based on the data you have provided. It can almost certainly be simplified:

;WITH dateCTE
AS
(
        SELECT * 
               ,date + [time] dt
        FROM #Schedule
)
,seqCTE
AS
(
        SELECT NoOfUnits
               ,dt
               ,ROW_NUMBER() OVER (ORDER BY dt) AS rn
        FROM dateCTE               
)
,recCTE
AS
(
        SELECT NoOfUnits
               ,dt
               ,1 AS SEQUENCE
               ,rn
        FROM seqCTE
        WHERE rn = 1

        UNION ALL

        SELECT s.NoOfUnits
              ,s.dt
              ,CASE WHEN (s.NoOfUnits      > 0 
                          AND  r.NoOfUnits > 0
                         )
                      OR (s.NoOfUnits      = 0 
                          AND  r.NoOfUnits = 0
                         )
                    THEN r.SEQUENCE
                    ELSE r.SEQUENCE + 1
               END
              ,s.rn
        FROM recCTE AS r
        JOIN seqCTE AS s
        ON   s.rn = r.rn + 1
)
,summaryCTE
AS
(
        SELECT RIGHT(CONVERT(varchar(23),MIN(dt),120),8)  AS startTime
               ,RIGHT(CONVERT(varchar(23),MAX(dt),120),8) AS endTime
               ,SUM(NoOfUnits) AS NoOfUnits
        FROM recCTE
        GROUP BY SEQUENCE
        HAVING SUM(NoOfUnits) != 0
)
SELECT startTime
       ,endTime
       ,SUM(NoOfUnits)
FROM summaryCTE  
group by startTime
         ,endTime  
OPTION (MAXRECURSION 0)
Ed Harper
Thanks for your time Ed. I have upadted my question with some example data covering 2 identical days of schedule data the code above produced 2 results - 13:00 to 00:00 on both days, with the correct no of units. However, I only need distinct time and unit count groups, so in this case would expect 1 result.
Daniel Kelley
@Daniel - please could you add your expected results to the question? I think I understand what you want, but it would be helpful to have it confirmed.
Ed Harper
Appreciate your patience Ed. I have changed my question to include an updated script for generating data, and have included the required results. Thanks.
Daniel Kelley
@Daniel - I have updated my answer based on the test data you have provided
Ed Harper
Ed that is awesome. I ran your query against a few other datasets and they all returned the right result. Thanks very much.
Daniel Kelley