views:

45

answers:

3

I have a table like this

DateTime start_time not null,
DateTime end_time not null,
Status_Id int not null,
Entry_Id int not null

I want to get the count of each status within a time period, where only the last started is valid for a given entry_id.

What I am using now is this (with dynamic dates):

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

The problem is that it counts wrong when there are some entry_id that have multiple entries the same start_date. (I don't particularly care which status is chosen in this case, just that only 1 is chosen)

Some test data:

status_id   Entry_id    Start_date
496 45173   2010-09-29 18:04:33.000
490 45173   2010-09-29 18:48:20.100
495 45173   2010-09-29 19:25:29.300
489 45174   2010-09-29 18:43:01.500
493 45175   2010-09-29 18:48:00.500
493 45175   2010-09-29 21:16:02.700
489 45175   2010-09-30 17:52:12.100
493 45176   2010-09-29 17:55:21.300
492 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems
493 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems

The result should be

495 1
489 2
492 1 (or 493 1)
+2  A: 

If i correctly understood, you want to count distinct entry for a specific status in your time period... if it is so, you should use the DISTINCT clause in your count() changing from count(*) to count(distinct Entry_id)

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(distinct Entry_Id) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

EDIT

AS long as you do not care which status is return for a given entry, i think you could modify the inner query to return the first Status and join the status too

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select c.Status_Id, count(c.Entry_Id) as cnt from 
 (select Entry_Id, Start_Date, (select top 1 Status_id from c where Entry_Id = CC.Entry_Id and Start_Date = CC.Start_Date) as Status_Id
  from (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) as CC) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
and c.status_id = d.status_id
GROUP BY c.Status_Id

Result

Status_id Count
 489       2
 492       1
 495       1
il_guru
This one will only work if the status_ids are the same for the duplicates
Cine
You need to count any different Entry_Id only one time?
il_guru
If there is an entry_id that has multiple entries with the same max(start_time), then it should only be included once. No matter what the status_id is. Yours will include it, if the duplicate date has different status_id
Cine
i have not enough time this moment to provide a query, but the solution could be to extract not only Entry_id and max(start_date) but also the Status_ID in the nested query. You should nest another query in which you take the first Status_id (with TOP 1) with Entry_id and stard_date given by your current inner query. I'll write something more tomorrow.
il_guru
Edited adding a new query version
il_guru
+1  A: 

Alternative answer based on OPs lovely comments.

WITH
   [sequenced_data]
AS
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY entry_id ORDER BY start_time DESC, status_id DESC) AS [sequence_id]
  FROM
    tbl
  WHERE
    start_time < '21:00' AND end_time > '19:00'
)
SELECT status_id, COUNT(*)
FROM [sequenced_data]
WHERE sequence_id = 1
GROUP BY status_id

The ROW_NUMBER() function is only needed where there isn't a single field that can uniquely identify individul records. Alternative queries can be written where there is a unique identity column in the data. SQL Server, however, is extremely effective at optimising ROW_NUMBER() queries such as above and it should (assuming relevant indexes) be effective.

EDIT

Someone just suggested to me that people don't like long code, they prefer compact code. So the CTE version has been replaced with an inline version (The CTEs really just helped breakdown the query for explanatory reasons, and is in the edit history if needed)...

EDIT

ROW_NUMBER() can't form part of the WHERE clause, as spotted by OP. Query updated by putting one CTE back in.

Dems
Ohh, that is a very very nice solution! (Your edit is broken though, the Row_number has to be in the select). It is also twice as fast as the one I came up with, since it doesn't have to duplicate elimination, just a specialized sort
Cine
A: 

I found a solution myself:

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
(select max(Status_Id) as Status_Id, c.Entry_Id from --<--- ADDED
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
group by c.Entry_Id) y  --<--- ADDED
GROUP BY Status_Id WITH ROLLUP
Cine