views:

350

answers:

2

I need to calculate all Employees that have X number of consecutive absences within a date range in SQL.

We have an Absences Table with 1 record for each day an employee is absent and a Calendar Table with the work days for the year.

tblAbsences
EmployeeID int
AbsenceDate datetime

tblCalendar
WorkDay datetime

Does anyone have any ideas how to calculate consecutive absences? Example: All employees that have 3 consecutive absences between 1/1/2009 and 3/1/2009.

A: 

This should work for you. GROUP BY on ConsecDates to find who was absent more than X number of times.

select a.*, 
     (
      select min(b.absenceDate) from tblAbsences b where a.employeeId = b.employeeId 
      and b.absenceDate >= a.absenceDate
      and not exists ( 
       select 1 from tblabsences c where c.employeeId = b.employeeId and dateadd( dd, 1, b.absenceDate) = c.absenceDate  
      )
) ConsecDates
from dbo.tblAbsences a
order by a.AbsenceDate asc
Nikhil S
This works very well, except I also need to exclude weekends. Example: Thursday, Friday and Monday off is 3 consecutive days. (Saturday and Sunday would not be in tblAbsences)
Mix Man
I added 3 days if the b.AbsenceDate is Friday, and it seems to be working. Thanks!...AND ( DATEADD(dd, 1, b.absenceDate) = c.absenceDate OR ( ( DATENAME(dw, b.absenceDate) = 'Friday' ) AND DATEADD(dd, 3, b.absenceDate) = c.absenceDate)
Mix Man
What about holidays? This solution doesn't use your calendar table at all.
Rob Garrison
Holdiays are a nice-to-have. If someone can add that in, it would be very nice.
Mix Man
A: 

Tested in PostgreSQL; SQL uses example values from posting.

There are no primary keys defined on the provided tables, and the code below addresses that. Better would be to add primary keys and optimize below code to leverage them: better data quality, better performance, cleaner code, happier people.

Removing the tbl prefix provides more flexibility in the database implementation. Table, view, and synonym can then be used interchangeably without impacting code that references the database object or breaking the naming convention.

/* period length as the desired number of consecutive days */
/* time window as the period to be analyzed */
SELECT DISTINCT
 /* Consolidate employees with multiple periods */
 a.employeeid
FROM
 (SELECT
   /* Generate all possible periods */
   pk_c.workday begin_date,
   /* End date for given period length; less one for closed boundaries */
   LEAD(pk_c.workday,3-1,NULL) OVER (ORDER BY pk_c.workday) end_date 
  FROM (SELECT DISTINCT
         /* No calendar PK, remove dupes; if PK, pull in-line view up */
         c.workday
        FROM sandbox.calendar c) pk_c
       ) p
 INNER JOIN sandbox.absences a ON 
  /* Match absences with periods */
  (a.absencedate BETWEEN p.begin_date AND p.end_date)
WHERE
 /* In time window desired; exclude periods extending beyond boundaries */
 (p.begin_date BETWEEN '2009-01-01' AND '2009-03-01'
  AND /* NOT NULL exclusion implied for periods beyond calendar boundaries */
  p.end_date BETWEEN '2009-01-01' AND '2009-03-01')
GROUP BY
 a.employeeid,
 /* Also group period, display only employee */
 p.begin_date
HAVING
 /* Number of absence days to match to the period length */
 /* DISTINCT due to missing absences PK; if PK, drop DISTINCT */
 COUNT(DISTINCT a.absencedate) = 3
;

Enjoy. Stripped version below:

SELECT DISTINCT
 a.employeeid
FROM
 (SELECT
   pk_c.workday begin_date,
   LEAD(pk_c.workday,3-1,NULL) OVER (ORDER BY pk_c.workday) end_date 
  FROM (SELECT DISTINCT c.workday FROM sandbox.calendar c) pk_c) p
 INNER JOIN sandbox.absences a ON 
  (a.absencedate BETWEEN p.begin_date AND p.end_date)
WHERE
 (p.begin_date BETWEEN '2009-01-01' AND '2009-03-01'
  AND p.end_date BETWEEN '2009-01-01' AND '2009-03-01')
GROUP BY
 a.employeeid, p.begin_date
HAVING
 COUNT(DISTINCT a.absencedate) = 3
;
SetProcessor