views:

80

answers:

5

I'm trying to find a SQL query that will count the number of distinct start times that are at least 30 minutes different.

I have a number of employees that are paid a credit when they start work on at least three distinct times in a week, where the start time is at least 30 minutes different from the other start times. For example:

select count(distinct (CONVERT(VARCHAR(10), starttime, 108))), employeecode
from schedule 
where CONVERT(VARCHAR(10), starttime, 108) >= 
(select min(CONVERT(VARCHAR(10), dateadd (mi, 30, s2.starttime), 108)) from schedule s2)  
group by starttime, employeecode

I am hoping to get a result with the employee code and the number of different and distinct start times. eg. Employeecode = 9999, Different Start times = 4 I have been bumbling through this and am yet to get something working...

Can anyone suggest where I am going wrong or a suitable solution that might help me? Thanks in advance for your help :)

A: 

[Update: Based on the poster's clarification of the problem in a comment on this answer, the problem I was solving with this answer is clearly not the problem the poster is trying to solve. I'm leaving the answer so as to show the solution to the other problem, and so as not to remove the comments which clarify the problem statement]

Break the problem into two parts: identifying the "unique" (within 30 minutes) starts and then counting them. The first part is the one I think you're having trouble with. Here is an approach:

SELECT employeecode, starttime FROM schedule S1
    WHERE NOT EXISTS (SELECT * FROM schedule S2 
        WHERE S2.employeecode = S1.employeecode AND
              S2.starttime > DATEADD(mi, -29, S1.starttime)

A few notes:

  • I copied the date math logic from your original query rather than looking up the syntax.

  • I assume starttime is DATETIME.

  • I used 29 minutes so that they'd get the bonus if the starttimes were 30 or more minutes apart (as stated in your problem statement). Actually, you should do this by doing the date math using seconds and subtracting (29 * 60) + 59. My version is slightly more generous to the employees than your problem statement specifies.

  • You can encapsulate this query in an view or inner query and do something like (assuming it's a view):

    SELECT employeecode, count(*) FROM unique_starts_view WHERE starttime BETWEEN (beginning of period) AND (end of period) GROUP BY employeecode HAVING COUNT(*) >= 3

  • The NOT EXISTS technique can be slow so it's best to limit that query to the period you're interested in.

Larry Lustig
I have tried the NOT EXISTS and had a few issues, possibly due to the order and details contained within it. I am also restricted from creating views(!) so the inner query may be the answer.
Withdalot
Would that not have problems with not returning enough if the times were close together? eg 7.00, 7.20, 7.40, 8.00, 8.20... None of them are more than 30 minutes from every other so if I'm understanding your SQL correctly (which I may not be) it wouldn't return a desired resultset...
Chris
@Chris: It will return the 7:00 time (assuming the database contains no other records for that employee after 6:30). I think that's what the OP wants — only records that are 30 minutes after the last start time. At least, that's my interpretation of the problem statement. I do recognize that it's possible he wants 7:00, 7:40 and 8:20 returned from your list.
Larry Lustig
Sorry about the confusion- to clarify the question:An employee will start at least once per day (eg Monday 07:00), the nest day (Tuesday 07:30) and so on, with.. (Wednesday at 07:30, Thursday 06:00, Friday 06:00)..Monday is counted as 1Tuesday and Wednesday are counted as 1Thursday and Friday are counted as 1 and subsequently this is the day the first credit is paid (3 start times). If Saturdays start time was 08:00, they would be give another credit.
Withdalot
@Larry Lustig: looks like my problem was that I was reading the problem differently to you. :)
Chris
@Withdalot: Given the more detailed problem statement in your comment, I doubt very much my solution is the one you want (it was designed with a different problem in mind -- more like a call center where people "start" frequently during the day, but only get the credit if a start is 30 minutes after the last one). I would advise you to update your question with sample data and desired results to make the problem clearer.
Larry Lustig
+1  A: 

While waiting for clarification on exact requirements I thought I'd suggest another approach. I'll put pros and cons with it...

If the start times are usually at or around a certain time (you're examples were always on the half hour or on the hour) then you can just split all the start times into what "band" they are in and then count the number of different bands.

eg 00:00-00:30 = Band 1 00:30-01:00 = Band 2 ... 07:00-07:30 = Band 15 ... 23:30-00:00 = Band 48

To get the bands you'd just need a simple (though quite lengthy) case statement.

The main problem with this approach is that it falls down when your times are next to the threshold. eg 07:29 and 07:31 would be in two different bands but are in fact only 2 minutes apart. This can be mitigated slightly if you are starting around the same time by making your bands start and finish at 15 and 45 mintues past each hour. Then if the start times are all in the middle of the bands then you will get it mostly right...

In my head though the problem isn't really one that suited for SQL so if you can do it in a different language that might be better...

You could probably do it in SQL with some tricky joins but I'm not capable of writing reliable SQL for it... Algorithmically though you want to do the following.

1) take earliest start time in the day and call that your first start time. 2) take the next earliest time that is at least 30 minutes later than the time from your previous step. 3) Repeat step 2 until you run out of times. 4) Count the times.

The problem with this from a SQL point of view is that it is trying to create data based off of a previous row which would mean doing stuff with cursors to loop through your times and storing things in variables.

Chris
There are set-based solutions to this problem, even without using bands, although I like your idea of the time-bands as it seems to suit the problem well.
Tom H.
@Tom H.: Yeah. I thought it sounded like shift work type thing with set starting points rather than just any time which is why I thought it worth putting down. I'm interested by what you have in mind when you are talkign abotu other set-based solutions...
Chris
A: 

I'm assuming that your database product is SQL Server based on your OP but you did not mention the version. If you are using SQL Server 2005 and later you could try something like:

With StartTimes As
    (
    Select StartDateTime 
        , Row_Number() Over( Order By StartDateTime ) As Seq 
        , DatePart(hh, StartDateTime) * 60 + DatePart(mi, StartDateTime) As Minutes
    From Schedule
    )
Select *
From StartTimes As S1
Where Exists(
            Select 1
            From StartTimes As S2
            Where S1.Seq <> 1
                And Abs(S2.Minutes - S1.Minutes) >= 30
            )
Thomas
A: 

Using the time bands (not to be confused with Time Bandits) that Chris mentioned:

CREATE TABLE Start_Periods
(
    begin_time    TIME        NOT NULL,
    end_time      TIME        NOT NULL,
    time_period   TINYINT     NOT NULL
    CONSTRAINT PK_Start_Periods PRIMARY KEY CLUSTERED (begin_time),
    CONSTRAINT CK_Start_Periods_begin_before_end CHECK (begin_time < end_time OR end_time = '00:00:00.000')
)
INSERT INTO Start_Periods (begin_time, end_time, time_period)
SELECT '00:00:00.000', '00:15:00.000', 1 UNION ALL
SELECT '00:15:00.000', '00:45:00.000', 2 UNION ALL
SELECT '00:45:00.000', '01:15:00.000', 3 UNION ALL
SELECT '01:15:00.000', '01:45:00.000', 4 UNION ALL
SELECT '01:45:00.000', '02:15:00.000', 5 UNION ALL
SELECT '02:15:00.000', '02:45:00.000', 6 UNION ALL
SELECT '02:45:00.000', '03:15:00.000', 7 UNION ALL
SELECT '03:15:00.000', '03:45:00.000', 8 UNION ALL
--...
SELECT '23:15:00.000', '23:45:00.000', 48 UNION ALL
SELECT '23:45:00.000', '00:00:00.000', 1

Your query then becomes:

SELECT
    SCH.employee_code,
    COUNT(DISTINCT SP.time_period) AS different_time_starts
FROM
    Schedule SCH
INNER JOIN Start_Periods SP ON
    SP.begin_time <= SCH.start_time AND
    SP.end_time > SCH.start_time
GROUP BY
    SCH.employee_code
Tom H.
A: 

Just to give you an idea:

  • Build the SQL for only one week
  • Do a SELF JOIN of schedule while sutracting the start times each from another
  • Now count all DISTINCT differences greater 30 Minutes for each customer -> gives you the number of start times

This should give you the desired result.

MicSim
This is excellent feedback, thank you all. MicSim's theory seems to be closes to the mark with what I am attempting, however can I ask for the specific SQL? I'm too green with SQL to know how this is done...Further clarification: The employee start times may at rostered at any time, yet this is not based on time bands. The employee could be rostered at 07:15 and as long as the next start time is at least 30 minutes different, it qualifies as part of the "three alternating start time" shifts that contribute towards the credit.
Withdalot