views:

1640

answers:

2

I have an interesting query to do and am trying to find the best way to do it. Basically I have an absence table in our personnel database this records the staff id and then a start date and end date for the absence. End date being null if not yet entered (not returned). I cannot change the design.

They would like a report by month on number of absences (12 month trend). With staff being off over the month change it obviously may be difficult to calculate.

e.g. Staff off 25/11/08 to 05/12/08 (dd/MM/yy) I would want the days in November to go into the November count and the ones in December in the December count.

I am currently thinking in order to count the number of days I need to separate the start and end date into a record for each day of the absence, assigning it to the month it is in. then group the data for reporting. As for the ones without an end date I would assume null is the current date as they are presently still absent.

What would be the best way to do this?

Any better ways?

Edit: This is SQL 2000 server currently. Hoping for an upgrade soon.

+2  A: 

I have had a similar issue where there has been a table of start/end dates designed for data storage but not for reporting.

I sought out the "fastest executing" solution and found that it was to create a 2nd table with the monthly values in there. I populated it with the months from Jan 2000 to Jan 2070. I'm expecting it will suffice or that I get a large pay cheque in 2070 to come and update it...

DECLARE TABLE months (start DATETIME)
-- Populate with all month start dates that may ever be needed
-- And I would recommend indexing / primary keying by start

SELECT
    months.start,
    data.id,
    SUM(CASE WHEN data.start < months.start
            THEN DATEDIFF(DAY, months.start, data.end)
            ELSE DATEDIFF(DAY, data.start, DATEADD(month, 1, months.start))
        END) AS days
FROM
    data
INNER JOIN
    months
        ON data.start < DATEADD(month, 1, months.start)
        AND data.end > months.start
GROUP BY
   months.start,
   data.id

That join can be quite slow for various reasons, I'll search out another answer to another question to show why and how to optimise the join.

EDIT:

Here is another answer relating to overlapping date ranges and how to speed up the joins...

http://stackoverflow.com/questions/452499/query-max-number-of-simultaneous-events/453180#453180

Dems
That's pretty much my current thought process joining to a table with a list of dates in, obviously I could maintain that with an overnight run or something or generate on the fly.
PeteT
I did something similar to the date table, but I created a CLR UDF (this is with SQL Server) that would generate a table of dates. Again assuming SQL Server, you could use common table expressions to do the same.
Sean Bright
The overhead of having a table of dates/months is really low. I would have it as a permanent table. You get the benefits of the indexes then too.
Dems
Nah, it allows me to generate hour, day, week, month, or year ranges with whatever criteria I need and is lightning fast.
Sean Bright
Still, having one or more permanent tables is always faster than generating them on the fly. And in my case the indexing of the table made a difference...
Dems
A: 

I tried this solution but it gives incorrect results when there are 1 or more complete months between the start and end date, e.g. start date 15-5-2010, end date 15-7-2010. it will correctly return 17 days for may 2010, but then it incorrectly returns the remaining 45 days (30 for june + 15 for july) for june 2010, and then correctly the remaining 15 days for july 2010. SO the 45 days for june is not correct, Does Anyone have a solution for this?

Hennie